dipper
dipper copied to clipboard
OBI:0100026 as taxon for variant objects
This bug was originally raised by @iimpulse
For reference, the query: https://api-dev.monarchinitiative.org/api/bioentity/gene/MGI:98297/variants?fetch_objects=true&start=0&rows=10&facet=true&facet_fields=subject_taxon&taxon=OBI%3A0100026
yields gene to variant associations. But there are some associations where object has a BNODE
prefix, and a taxon of OBI:0100026 'organism'
. This makes it difficult to filter variants based on taxon since 'organism' is too generic of a taxon term.
@cmungall @kshefchek Thoughts?
CC'ing @monicacecilia for her awesomeness!
we originally attempted to infer taxon on genotype parts, instead of making an explicit edge for each (in retrospect maybe a mistake). Theres an inference path in the solr loader that has been broken for some time, in the sense that it either doesn't infer the taxa or infers 'organism'.
tl;dr this should probably be fixed in dipper and likely won't be in the short term
see also - https://github.com/SciGraph/golr-loader/issues/10
Thats good to know.
Thanks @kshefchek
So this is blocked by Dipper or SciGraph loader? or both?
The way it works now, this could either be fixed in dipper or the golr-loader code. We could also add something in scigraph but it would be some new post processor. I think the best thing to do is to add it in dipper.
looking closer, many of these are transgenes, so which taxon applies? I would think the taxon in which the variant is studied but that is not entirely accurate.
@mbrush Are you able to join the monarch-ui call on Tuesday November 12?
This ticket is in relation to representation of variants, and we believe we need your help.
Hi. Happy to join call on Tuesday. In the meantime, Appendix I of this document provides food for thought that I think is relevant to this topic. It gets pretty into the weeds concerning what it means to be a 'transgene' or an 'allele' from the GENO perspective. But the key bits are in the third paragraph that starts with "An allele . . . "). Copying key text below, but see document for broader context.
An allele in GENO, including those caused by insertions, is an allele_of some reference genomic feature. This feature is typically a gene, but even insertions falling outside of genes are considered alleles_of the reference feature they alter (e.g. alleles of other named features such as QTLs). The feature or gene that an allele is an allele_of is entirely dependent on its genomic position, and not on the sequence content it contains. For example, insertion of the S. cerevisiae GAL4 gene sequence within the D. melanogaster Bx gene locus would create an allele_of this Bx gene, but the resulting transgene would not be considered an allele_of the S. cerevisiae GAL4 gene - because positionally it is not located in a yeast genome at the yeast GAL4 locus. Rather, GENO would say that this transgene derives_sequence_from the S. cerevisiae GAL4 gene.
I am confused about why we are talking about an OBI ID in the first place. We shouldn't be using the OBI class for organism.
@cmungall this comes from a multi integration issue, first from running elk on geno, then attempting to infer taxon via this graph path search: https://github.com/SciGraph/golr-loader/blob/master/src/main/java/org/monarch/golr/GolrLoader.java#L157
@kshefchek send along the list of troublesome IDs when you get a chance, and I"ll figure out what Dipper ingests needs to be corrected here