omicverse icon indicating copy to clipboard operation
omicverse copied to clipboard

More complete gene ensembl id -> hgnc symbol pairs table

Open ElderMedic opened this issue 1 year ago • 1 comments

Is your feature request related to a problem? Please describe. data=ov.bulk.Matrix_ID_mapping(data,'ref/genesets/pair_GRCh38.tsv') left with over 20k+ unconverted gene ensembl ids (h.sapiens, Grch38, 30%+ of all genes in counts). I was trying to build a more complete table.

Describe alternatives you've considered I just selected approved symbols and ensembl ids in the hgnc website: https://www.genenames.org/download/custom/ Removed all nan and made it a tsv table. Using that table I have all gene ids mapped.

Additional context See attached for the gene id mapping table.

pair_hgnc_all.tsv.tar.gz

ElderMedic avatar Mar 20 '24 10:03 ElderMedic

I just discovered if I do ov.bulk.Matrix_ID_mapping(data,'ref/genesets/pair_hgnc_all.tsv') unmapped genes will be cut from the dataframe so maybe need to disallow function to remove genes that's not on the gene id pair table.

ElderMedic avatar Mar 20 '24 11:03 ElderMedic