Metascape is a gene list analysis website designed for biologists

Background

The analysis of genomics and proteomics datasets is not an easy task. Most of the current gene list analysis sites are limited to the enrichment analysis of pathways, thus the value of omics-scale datasets is not fully explored. A more comprehensive analysis is often beyond the reach of biologists without strong bioinformatics support. In the era of big data where large-scale biological datasets has become more readily available, the Metascape website (http://metascape.org) provides a convenient way for biologists to more efficiently and effectively understand experimental data. The Metascape team recently published an article entitled “Metascape provides a biologist-oriented resource for the analysis of systems-level datasets” in Nature Communications (DOI: 10.1038/s41467-019-09234-6).

Introduction

Metascape integrates more than forty bioinformatics knowledgebases; it provides a simple interface to allow biologists carry out a one-click Express Analysis and obtain comprehensive analysis results. It not only includes pathway enrichment analysis, protein interaction network structure analysis, and rich gene annotation functions, but also presents the results in a high-quality graphic language that is easily understandable by biologists. Compared to other tools, Metascape intends to address challenges such as steep learning curves, obsolete databases, and difficulties in result interpretation.

Features

Metascape is very easy to use. Users submit a gene list and click the Express Analysis button. Metascape automatically recognizes all commonly use gene and protein identifiers. After the analysis is complete, the web page will guide the user to open an Analysis Report. The analysis report mimics the format of scientific research papers to present the results of the analysis, and the design of figures and tables is extremely friendly to biologists. The report elaborates on the details of both the analysis methods and the graphics. All graphics come with a high-definition file format that is publication ready. The report also provides a formatted Excel file, which many articles use directly as a supplementary table. The automatically generated PowerPoint file is convenient for scholars to communicate their research. All data and figure files can be downloaded and saved via a Zip file package. The protein network file format also supports further analyses using third-party software such as Cytoscape. Users can also use the Custom Analysis button to adjust more analysis functions and parameters, as they learn more.

Figure 1. Analysis results automatically generated by Metascape.

Figure 2 shows the main analysis results of Metascape using a list of 121 influenza host factors as an example.

Figure 2. a) Metascape removes functionally redundant enrichment paths, showing the most important experimental results straightforwardly with a bargraph. b) Enriched biological pathways can be represented in a network, which facilitates the understanding of relationships among biological pathways or processes; c) Metascape automatically extracts the protein interaction network contained in the input list. d) To make it easier to understand the network, Metascape uses the established MCODE algorithm to find densely-connected protein neighborhoods in the network, and the biological roles of each component are annotated as well.

Modern multi-omics experiments often generate multiple gene lists, and current web tools rarely analyze and integrate multiple gene lists simultaneously. In contrast, this is precisely one of Metascape’s strengths. In fact, Metascape’s “meta” is derived from multi-list meta-analysis. Figure 3 illustrates an example using three independently-published gene sets of influenza host factors.

Figure 3. a) Metascape uses heatmap to make shared and unique biological pathways among the three datasets visible at a glance. b) The enriched pathway can also be presented in a network. Since each set of host factors is represented by a unique color, it is evident that the “viral gene expression” is shared by all three datasets and the “regulation of cell development” is mainly found only in the green corresponding experiment.

Metascape provide biologists with new forms of data representation that are very effective in presenting results. Some papers even use multiple Metascape graphics for the illustration, such as the two examples in Figure 4. Readers may encounter Metascape-style charts from time to time in the literature.

Figure 4. a) Taken from Figure 5 in Lotan et al. Molecular Psychiatry (2018) 78:865; b) taken from Figure 3 in Dong et al. Genome Biol (2018) 19:31.

Comments

Many biologists are still using DAVID for the enrichment pathway analysis. The results of the enrichment pathway analysis are largely dependent on the quality of the backend knowledgebase. DAVID once had not updated its database for six years (2010-2016), and its latest update was two and a half years ago. Independent study has shown that using the two-year old Gene ontology database, users lose an average of 20% of the latest biological insights. Therefore, the importance of regularly updating the database cannot be over emphasized. Unfortunately, the reality is that only 40% of the popular enrichment analysis web tools are reasonably maintained. Readers should question where their current favorite tool falls under. Metascape updates more than 40 backend databases monthly to ensure the most accurate results.

Metascape eliminates the learning curve, because it cannot be easier than a one-click Express Analysis. Nevertheless, Metascape does not trade key functionalities for convenience. Since the authors spent a lot of time carrying out similar bioinformatics analysis during their research over the years, we decided to implement and automate the best practices in Metascape. Since commonly-used gene list analysis tools mostly only provide single-gene-list enrichment analysis, this unfortunately leads to the misunderstanding that gene-list analysis is equivalent to knowledge-driven enrichment analysis. As the result, data-driven protein interaction network analysis is rarely supported by websites. Metascape attempts to rectify that. In fact, in addition to many analysis functions provided by Metascape described above, it also offers powerful features of annotating thousands of genes or using a knowledgebase for membership analysis, all can be extremely helpful in triaging candidates for downstream validation. Please refer to the article or website documentation for details. The analysis capabilities implemented in Metascape is considered difficult even for bioinformaticians.

The Metascape website has been cited by more than 350 times before it was officially published in Nature Communication; the citations include Nature, Science, Cell, etc. Interestingly, about two-thirds of the citations use charts and sheets generated by Metascape. This may be due to the fact that the design of these charts and sheets is based on the many-year experience of the authors research career. We have seen those examples in Figure 4 above.

We encourage biomedical researchers to try Metascape and compare it to the tools you currently use. If Metascape can help you improve your research efficiency in some way, we will be extremely happy. Readers interested in the background of Metascape can read another blog here.

Note: this blog is largely based on Google Translation from an original Chinese version, a very impressive AI product.

This entry was posted in Comment, News, Visualization. Bookmark the permalink.