helpdesk
helpdesk copied to clipboard
A question about the "With" field (affects InterMIne)
Describe the issue/bug
I was not sure which tracker to put this on so beginning with helpdesk.
InterMIne is loading pombe data for a pombeMIne However, Intermine treats the entries in the GO-GAF "with" field as 'genes' .
Some of the PomBase 'with field' entries and not genes, so we get extra genes added.
See https://github.com/intermine/pombemine/issues/51
I wonder if we should make a specific syntax to refer to specific iso-forms in the "with" field?
I.e. DB:gene_symbol[isoform_symbol] so that it follows the same format as allele?

This is a slight edge case, but we have a family of selfish genes (meiotic drivers) where the long isoform is the poison and the short isoform is the antidote, and so we can annotate the different known isoforms of the family members from the closely related fission yeast, or from other family members.
(this is probably affecting other mines. We only spotted it because I looked for genes without a feature type)
@cmungall @vanaukenk
@ValWood Are SPCC548.03c.1 and SPCC548.03c.2 transcripts? If so, I believe the DB:sequence_id was meant to cover transcripts.
It seems this is an issue with InterMine assuming that all With/From values represent genes which, for GO, certainly is not the case.
SPCC548.03c.1 and SPCC548.03c.2 are transcripts.
But it seem that at present we make a special case for alleles where we specify [gene] [allele]
I would not read DB:sequence_id as including transcripts/isoforms (I assumed this was referring to accession numbers rather than symbols). It might be useful for the docs to be more explicit.
Anyway I will report back to InterMine that. they cannot assume that these identifers are genes.
v
But it seem that at present we make a special case for alleles where we specify [gene] [allele]
I had forgotten we have this. It doesn't seem well documented. I don't have the query at hand to see how often this is used, but an ad-hoc check reveals no usages in human or the MODs?
I'm guessing most parsers wouldn't deconstruct this and just treat this as an ID, and URL resolution would fail.
I think if we do want to refer to isoforms we use as isoform ID the same way we would anywhere else, e.g. c17, no ad-hoc syntaxes.
That makes sense.