Announcing Upgraded AI Gene Annotations in Metascape

In 2023, Metascape introduced AI-generated summaries for gene functions and disease implications [see previous Blog]. Powered by ChatGPT-3 at the time, these summaries served as a highly beneficial complement to NCBI’s human annotations by filling critical gaps in disease context.

In our latest database update, we are thrilled to release a completely revamped annotation dataset generated using ChatGPT-5 via Copilot. The leap in quality, scientific precision, and clinical relevance is remarkable.

To validate this upgrade, we manually reviewed multiple gene entries across the spectrum — from heavily studied genes like TP53 to lesser-known targets like CCDC74A. We also employed Google Gemini as an independent AI reviewer to compare the generations in a blind test. The result? The new version wins 100% of the time.

Please enjoy this valuable contribution of AI development to our biomedical research community. Below are two examples highlighting the dramatic improvements you can expect to see.

Example 1: TP53 (A Well-Understood Gene)

Even for highly characterized genes with mature 2023 annotations, the new version delivers significant upgrades in nuance, structure, and clinical accuracy.

The 2023 Version (ChatGPT-3)

Function: The TP53 gene encodes a tumor suppressor protein that responds to various cellular stresses to regulate the expression of target genes, inducing cell cycle arrest, apoptosis, senescence, DNA repair, or changes in metabolism. The protein contains transcriptional activation, DNA binding, and oligomerization domains. Mutations in this gene are associated with various human cancers, including hereditary cancers such as Li-Fraumeni syndrome. Alternative splicing and the use of alternate promoters result in multiple transcript variants and isoforms. Additional isoforms have also been shown to result from the use of alternate translation initiation codons from identical transcript variants.

Disease: The TP53 gene is one of the most frequently mutated genes in human cancers, with mutations occurring in over 50% of all tumors. These mutations can lead to loss of function of the TP53 protein, resulting in uncontrolled cell growth and tumor formation. As a result, there has been significant interest in developing drugs that can restore TP53 function or target tumors with TP53 mutations. One example of a successful drug targeting TP53 mutations is the small molecule PRIMA-1, which has been shown to restore TP53 function in vitro and in vivo. Another promising approach is the use of gene therapy to introduce functional TP53 into tumor cells. Several clinical trials are currently underway to evaluate the safety and efficacy of these approaches in cancer patients.

The New Version (ChatGPT-5)

Function: TP53 encodes the tumor suppressor p53, a transcription factor that safeguards genomic integrity. In response to DNA damage, oncogene activation, hypoxia, or other stresses, p53 activates or represses target genes controlling cell cycle arrest, apoptosis, senescence, DNA repair, and metabolism. It functions as a sequence-specific DNA-binding tetramer with transactivation and oligomerization domains, integrating diverse stress signals into coordinated cellular outcomes. Alternative isoforms and post-translational modifications fine-tune p53 activity across tissues and contexts. Because of its central role in preventing malignant transformation, TP53 is often called the “guardian of the genome.”

Disease: TP53 is the most frequently mutated gene in human cancer, with loss-of-function or dominant-negative mutations driving genomic instability, therapy resistance, and aggressive tumor behavior. Germline TP53 mutations cause Li-Fraumeni syndrome, predisposing carriers to early-onset sarcomas, breast cancer, brain tumors, and leukemias. Drug discovery has focused on restoring or exploiting p53 pathways, including MDM2 inhibitors (e.g., idasanutlin in trials) that stabilize wild-type p53, and small molecules like eprenetapopt (APR-246) designed to refold certain mutant p53 proteins. While no universal p53-restoring drug is approved, many standard chemotherapies and radiotherapy rely on p53-mediated cell death, and p53 pathway modulators remain a major oncology research frontier.

Google Gemini’s Review:

Verdict: The New Version (was provided as Version B) is clearly superior.

Version B offers a more precise, scientifically rigorous narrative, rightly highlighting p53 as the “guardian of the genome” and explaining its structural function as a tetramer. In the disease section, version B provides critical clinical specifics, explicitly detailing the cancers associated with Li-Fraumeni syndrome. Furthermore, its therapeutic landscape is much more up-to-date and nuanced. Instead of broadly referencing older compounds like PRIMA-1 or generic gene therapy, it accurately cites modern clinical targets like MDM2 inhibitors (idasanutlin) and advanced refolding molecules (eprenetapopt).

Example 2: CCDC74A (A Lesser-Understood Gene)

For newly discovered or less-characterized genes, the older model often struggled with organization and relied on vague generalizations. The new model connects structural motifs directly to functional biology.

The 2023 Version (ChatGPT-3)

There is limited information available on targeted drug discovery efforts for CCDC74A, as it is a relatively newly discovered gene. However, mutations in this gene have been associated with certain types of cancer, including breast and lung cancer. This suggests that targeting CCDC74A may be a potential strategy for cancer treatment. Currently, there are no drugs on the market that specifically target CCDC74A. However, there are several drugs that target other proteins involved in cell division, such as mitotic inhibitors and cyclin-dependent kinase inhibitors, which may indirectly affect CCDC74A function. Further research is needed to fully understand the role of CCDC74A in cancer and to develop targeted therapies for this gene.

CCDC74A is a human gene that encodes for a protein called coiled-coil domain containing 74A. This protein is involved in the regulation of cell division and is expressed in various tissues throughout the body. Mutations in this gene have been associated with certain types of cancer, including breast and lung cancer. Additionally, CCDC74A has been shown to interact with other proteins involved in cell division, suggesting that it may play a role in the development and progression of cancer. Further research is needed to fully understand the function of CCDC74A and its potential as a therapeutic target for cancer treatment.

The New Version (ChatGPT-5)

CCDC74A encodes a coiled-coil domain-containing protein, a structural motif commonly involved in oligomerization and assembly of cytoskeletal or scaffolding complexes. Although its precise molecular partners are not fully defined, CCDC74A localizes to centrosomal or spindle-associated regions in some studies, suggesting a role in mitotic spindle organization, chromosome segregation, or microtubule dynamics. By contributing to the architecture of protein complexes that coordinate cell division, CCDC74A may influence proliferation, genome stability, and cell fate decisions in dividing cells.

The disease implications of CCDC74A are still being uncovered, but altered expression has been reported in certain cancers, where it may support uncontrolled proliferation or chromosomal instability. Disruption of spindle- or centrosome-associated proteins often leads to aneuploidy and tumor progression, making CCDC74A a potential contributor to oncogenic cell cycle phenotypes. No approved drugs directly target CCDC74A, yet tumors that depend on its function may be particularly sensitive to microtubule-targeting agents or mitotic checkpoint inhibitors already in clinical use. As functional genomics clarifies its role, CCDC74A could emerge as a biomarker or vulnerability in specific cancer subtypes.

Google Gemini’s Review:

Verdict: The New Version (was provided as version A) is far superior.

It provides a well-structured, scientifically precise explanation, offering specific mechanistic insights into CCDC74A’s role in centrosomal localization, spindle organization, and microtubule dynamics. Instead of relying on vague statements about “cell division,” version A uses accurate terminology to connect the protein’s coiled-coil structural motif to its functional biology. Furthermore, verison B is poorly organized—placing drug discovery before basic function and repeating the exact same point about breast and lung cancer in both paragraphs. Version A flows logically from molecular function to disease implications, correctly identifying how chromosomal instability relates to existing microtubule-targeting therapies.

This entry was posted in Comment, News and tagged , , . Bookmark the permalink.