This protocol is designed as a walk-through tour of popular functional enrichment analysis tools and describes the use of three functional enrichment tools:
This tour can be used with your own data, or with an example dataset. Three example datasets are provided in the following slides.
The first dataset is from the TCGA lung cancer study, and includes a comparison of the expression of transcripts in lung cancer biopses versus normal tissue. The data was processed to produce the example files:
These input files can be used for ORA and GSEA analysis in Enrichr and WebGestalt.
The Pinto et al. study is a multi-OMICs study of SARS-CoV-2 host responses in lung epithelial cells. The data files below were adapted from the supplementary data files provided with the publication, which were already pre-filtered.
These input files can be used for ORA analysis in Enrichr and WebGestalt.
The Voineagu et al. study compares the transcriptome between autistic and normal brain, and was downloaded from Expression Atlas. It contains data for all genes measured with the following comma-separated columns:
This data is used for ORA and GSEA analysis in the
The Enrichr tool offers an easy-to-use interface for basic Over-Representation Analysis (ORA) for a large number of gene set libraries.
Before starting analysis, you can browse the available gene sets under
Enrichr includes over 200 gene set libraries, including Gene Ontology terms, pathways, disease-associated sets, cell type markers etc.
You can also search for gene sets either by the term name under
Analysis is started by simply copying the list of genes from an input text file into the input box on the right of the
Note that this corresponds to the TCGA example data file containing up-regulated genes.
The results are displayed as a grid of libraries for each functional category (at the top), the image below shows the results in the
Clicking on one of the squares will show the detailed results for that library:
Looking at the results for a specific library in detail, the gene sets are sorted by p-value ranking by default. Clicking on any of the bars representing gene sets re-sorts the bar graph by the different score.
To download an image of the bar chart, click either of the
In addition to the default bar chart, results are also available as a table under
At the bottom of the table there is a link to
WebGestalt offers more advanced analysis options for a smaller number of gene set libraries. Using WebGestalt, it is possible to run either Over-Representation Analysis (ORA), Gene Set Enrichment Analysis (GSEA) or Network Topology-based Analysis (TNA).
To start analysis, the
The input data is defined in the
Clicking
The top of the results page includes a job summary and a link to download the full results.
Results are displayed as a bar chart by default. Right-clicking on the bar chart lets you download in either
Clicking on the bars updates the pathway-specific display at the bottom of the page.
The results specific to a gene set (pathway in this case) include the scoring statistics calculated for the enrichment, including the enrichment score, a sortable table and the enrichment plot.
The enrichment plot is described in detail here. Briefly, the plot in the upper half represents the running enrichment score (ES), as the analysis walks down the ranked list of genes (bottom section), starting at the most highly ranked gene. When a gene in the pathway is in the ranked list, the score goes up, and when it’s not it goes down. The corresponding overlap between the ranked list and the gene set is indicated by a line in the middle section. The top score in the enrichment map plot is the score reported for the particular gene set.
For WikiPathways results, the link in the upper left is clickable and will open a pathway view with the overlapping/leading edge genes highlighted.
In addition to the bar chart, the main results overview can also be visualized as a table or volcano plot by clicking the
The volcano plot has options for customized downloads available, as well as pan/zoom controls.
Interactive Enrichment Analysis is a user-friendly interactive tool to perform enrichment analysis for multiple datasets across multiple public databases. This tool can run both Over-Representation Analysis(ORA) and Gene Set Enrichment Analysis (GSEA). Follow these steps to get started:
if(!require(devtools)) install.packages(c("devtools","httr")); library(devtools); library(httr)
options(shiny.launch.browser = .rs.invokeShinyWindowExternal)
source_url("https://raw.github.com/gladstone-institutes/Interactive-Enrichment-Analysis/main/launch_app.R")
This will install some basic dependencies, download the project to your working directory (Files tab in RStudio) and launch the tool in your browser.
The tool will open in your browser:
A set of public database collections is provided in the drop-down in the
Once a database collection is selected, the specific databases contained in the collection will be displayed.
Input datasets are selected in the
The tool will display a preview of the first few rows of the first chosen dataset along with the required and optional columns that were detected.
When
Once analysis is started, the initial set of panels will collapse and a
Analysis results can be viewed by clicking
In the results app, the
The input data is visualized in a volcano plot, which plots genes by statistical significance, p.value, versus magnitude of change, fold.change. Genes are highlighted based on the p.value and fold-change cutoffs selected during setup. Selected genes are labeled; select genes either by top n genes, or by selection by name.
A bar plot of your input data is also available (via a drop-down), highlighting positive and negative fold-change values for a subset of genes, either top n genes or genes selected by name.
The
The table and plots can all be downloaded.
Using the left side panel, one can navigate between GSEA and ORA methods while viewing the results for a particular database to explore hits in common (and unique) to the different methods. Here we are comparing the ORA and GSEA results for the WikiPathways database, note the differences in the table and dot plot.
A
Several plot types are available for visualizing the gene overlap between results.
For each result in the results table, results-specific plots are available based on the database the results is from and the analysis method used. For example, for GSEA analysis, the GSEA Enrichment Score plot is available. As described earlier, the plot in the upper half represents the running enrichment score (ES), as the analysis walks down the ranked list of genes. The middle section shows where the members of the gene set appear in the ranked list of genes. The top score in the enrichment map plot is the score reported for the particular gene set.
Some database-specific visualizations are also available, for example WikiPathways visualizations of the data on models from the WikiPathways database. Genes that met the criteria for fold change and p value cutoff (for ORA), or leading edge genes (for GSEA) are highlighted in the pathway (orange for up, blue for down). The WikiPathways button will take you to the pathway model at WikiPathways.
Opening the pathway in a new window gives you access to a larger, interactive format of the pathway.
Once you have completed exploring your data and results in either of these tools, there are several options for continued analysis and exploration: