cbioportal
cbioportal copied to clipboard
alias for gene symbol seems to work only in one direction
From user: "The alias endpoint doesn’t seem to work in both directions for gene aliases. For example, if you get aliases for KMT2D it will return MLL2 but if you get aliases for MLL2 it won’t return KMT2D"
We should figure out if this is the expected behavior
Thanks for opening this @inodb! I've outlined some use cases below that are hopefully helpful:
- Sometimes we have to combine maf files from several studies and the way certain genes are coded may differ between studies (or even within studies). For example, the
data_mutations_mkscc
file for thensclc_pd1_msk_2018
study has both MLL* and KMT2* symbols within it. If we run these as is through our processing/ analysis scripts they will show up as two separate genes. To help identify where we need to merge, we may run the hugo column through the alias search and the MLL* genes will show no results. There are ways we can code around this but just pointing it out because it wasn't what I expected. - When an API query requires we search by gene, we want to make sure we are including all potential aliases in that query so we aren't missing any results that may be coded in the underlying data as an alias version of the gene (e.g. from older data sets that use different nomenclature).
I'm assuming the alias endpoint was set up primarily for use in the UI querying, so may not be appropriate for our use cases outlined above. If so, do you have a function or standardized dictionary of accepted gene names/ aliases you use to disambiguate?
Thanks!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Commenting to keep this open, thanks
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.