Gene Annotation by ChatGPT

We are excited to leverage the power of ChatGPT to improve Metascape! ChatGPT is a powerful language engine that has gained broad biological knowledge by digesting text from the Internet. GPT-4 scores a 5 in AP Biology and 99th to 100th percentile on the 2020 Semifinal USA Biology Olympiad [1]. Given appropriate prompts, ChatGPT can summarize gene functions rather impressively.

Standing on the shoulder of NCBI’s gene summary, ChatGPT describe gene functions equally well, if not better. Readers can compare the following paragraph to NCBI’s version [2] regarding gene TP53:

The TP53 gene encodes a tumor suppressor protein that responds to various cellular stresses to regulate the expression of target genes, inducing cell cycle arrest, apoptosis, senescence, DNA repair, or changes in metabolism. The protein contains transcriptional activation, DNA binding, and oligomerization domains. Mutations in this gene are associated with various human cancers, including hereditary cancers such as Li-Fraumeni syndrome. Alternative splicing and the use of alternate promoters result in multiple transcript variants and isoforms. Additional isoforms have also been shown to result from the use of alternate translation initiation codons from identical transcript variants.

Importantly, ChatGPT summarizes disease and drug information that is critical for biomedical researchers but not yet captured by NCBI’s gene summary:

The TP53 gene is one of the most frequently mutated genes in human cancers, with mutations occurring in over 50% of all tumors. These mutations can lead to loss of function of the TP53 protein, resulting in uncontrolled cell growth and tumor formation. As a result, there has been significant interest in developing drugs that can restore TP53 function or target tumors with TP53 mutations. One example of a successful drug targeting TP53 mutations is the small molecule PRIMA-1, which has been shown to restore TP53 function in vitro and in vivo. Another promising approach is the use of gene therapy to introduce functional TP53 into tumor cells. Several clinical trials are currently underway to evaluate the safety and efficacy of these approaches in cancer patients.

ChatGPT’s ability to automate knowledge summarization becomes invaluable when little or no available human curations exist. While NCBI provides no gene summary for CCDC74A [3], ChatGPT writes:

There is limited information available on targeted drug discovery efforts for CCDC74A, as it is a relatively newly discovered gene. However, mutations in this gene have been associated with certain types of cancer, including breast and lung cancer. This suggests that targeting CCDC74A may be a potential strategy for cancer treatment. Currently, there are no drugs on the market that specifically target CCDC74A. However, there are several drugs that target other proteins involved in cell division, such as mitotic inhibitors and cyclin-dependent kinase inhibitors, which may indirectly affect CCDC74A function. Further research is needed to fully understand the role of CCDC74A in cancer and to develop targeted therapies for this gene.

and

CCDC74A is a human gene that encodes for a protein called coiled-coil domain containing 74A. This protein is involved in the regulation of cell division and is expressed in various tissues throughout the body. Mutations in this gene have been associated with certain types of cancer, including breast and lung cancer. Additionally, CCDC74A has been shown to interact with other proteins involved in cell division, suggesting that it may play a role in the development and progression of cancer. Further research is needed to fully understand the function of CCDC74A and its potential as a therapeutic target for cancer treatment.

ChatGPT’s summaries are now automatically included for all protein-coding human genes in Metascape’s Gene Annotation analyses. Two annotation columns: “Protein Functions (ChatGPT)” and “Disease & Drugs (ChatGPT)” are added to the Excel sheet after Metascape analysis. We believe this new feature will greatly assist Metascape users to review and identify gene candidates more efficiently and effectively. Just be mindful that the annotations were extracted from ChatGPT without any human curation; caution and verification will be needed, before precious time and resource is invested in further characterizing gene candidates.

We hope you find this new feature useful!

Reference

  1. https://openai.com/research/gpt-4
  2. https://www.ncbi.nlm.nih.gov/gene/7157
  3. https://www.ncbi.nlm.nih.gov/gene/90557
This entry was posted in Uncategorized and tagged , , , . Bookmark the permalink.