Watch out gene symbols within Excel

[This is an old blog written on Wednesday, February 10, 2016]

We do not recommend use gene symbols (or synonyms) as primary identifier within a gene list, especially within Excel. Excel irreversibly converts certain symbols into dates and it becomes much worse when gene synonyms are used.

We checked all primary human gene symbols, Excel automatically converts the following 35 symbols into dates: FEB1, FEB2, FEB5, FEB6, FEB7, FEB9, FEB10, MARCH1, MARC1, MARCH2, MARC2, MARCH3, MARCH4, MARCH5, MARCH6, MARCH7, MARCH8, MARCH9, MARCH10, MARCH11, SEPT1, SEPT2, SEPT3, SEPT4, SEPT5, SEPT6, SEPT7, SEPT8, SEPT9, SEPT10, SEPT11, SEPT12, SEPT14, SEP15, DEC1.

The conversion is irreversible for two reasons.  First, the original symbol is lost and the cell only stores an integer representing the number of days since Jan 1, 1900.  Second, notice both MARCH1 and MARC1 map to the same date, as well as MARCH2 and MARC2.

The situation become even worse, if we allow gene synonyms.  E.g., SEP53 (Gene ID 49860) becomes Sep, 1953, 2E4 (Gene ID 11133) becomes 20000. 9-27 (Gene ID 8519) becomes Sep 27th.

It is a wild west, when we look into mouse and rat.  There are primary symbols such as 201E9, 9130022E09, 3e46, NA, NaN, etc.

How to fix a gene symbol in Excel? To enter MARCH1 into a cell, type ‘MARCH1 (prefix it with a single quote). This hints Excel to preserve the input, while the single quote is nicely invisible to Excel formula and in data export.

For all the reasons above, we recommend other gene identifier types to be used with Excel.  Metascape supports Entrez Gene ID, RefSeq, UniProt, Ensembl and UCSC identifiers, which all work peacefully with Excel.

This entry was posted in Comment and tagged , , . Bookmark the permalink.