Gemma icon indicating copy to clipboard operation
Gemma copied to clipboard

Include "Gene Type" field for Gene (and GeneValueObject)

Open nzllim opened this issue 7 years ago • 1 comments

During analysis, a frequently raised question is the number of genes that are protein coding, and otherwise. This information could be stored in Gemma for convenient access in the future.

The gene type can be found in the "type_of_gene" column of this file after unpacking: ftp://ftp.ncbi.nih.gov/gene/DATA/gene_info.gz

*Note that the file post-decompression is >2GB.

Below is a list of possible gene types used (correct as of April 2018), semi-colon delimited: other; protein-coding; pseudo; tRNA; ncRNA; miscRNA; rRNA; unknown; snoRNA; snRNA; scRNA; biological-region

nzllim avatar Jul 04 '18 19:07 nzllim

gene_info is the file we use already and we parse the gene type already, but it is not used. It is held in the NCBIGeneInfo so adding this would be trivial.

CHROMOSOME_FEATURE has a TYPE column but it is unpopulated and I don't believe it is part of the data model.

It would be an extra task to backfill this information as the updateGene process only addresses updates - as it stands, this field would only be populated in newly added genes or perhaps when genes are updated in other ways.

ppavlidis avatar Apr 11 '23 20:04 ppavlidis