mme-apis
mme-apis copied to clipboard
Capture that effects are transcript-specific
The current API has the problem that effects are specified without a corresponding transcript. The same variant may have different effects on different transcripts, and the user/system may want to transmit one or more of them.
Combined with having a list of genes and list of variants per genomicFeature (#114), the data model is getting quite complex. I would propose one of two approaches.
-
One solution would involve specifying the effect per-transcript in a way that handles multiple effects per variant. For example, instead of each variant having:
"type": { "id": "SO:0123456" }
each variant would have something like:
"effects": [ { "transcript": { "id": "Ensembl:ENST000012345" }, "effect": { "id": "SO:0123456" } }, ... ]
-
Instead, we could break away from each individual genomicFeature having a list of genes, a list of variants, and potentially a list of transcript, and instead adopt a tuple-like approach to storing genomicFeatures. Each genomicFeature would essentially be a tuple of: (
gene
,transcript
,variant
,effect
), where one or more of those may be empty. Since each of these would be a single value, it differs from the current API in several notable ways:- multiple variants in the same gene would be represented as multiple genomicFeatures (with redundant gene)
- multiple genes overlapping one variant would be represented as multiple genomicFeatures (with redundant variant)
- this is certainly simpler to specify, but may be more complex to compute on or visualize
- it is unlikely to match any individual MMs internal data model, but may be more effective for data transfer
Thoughts?
I'm not sure if this is needed, however it could be benificial.
For some clarification: The related problem in #110 was that a variant had a consequence which was not the most severe. This was less a problem and more an observation, assuming the most sever consequence was the most relevent when no transcript is given.
Our algorithm for matching / finding similar variants is based on the consequence (effect). Which transcript is of less consequence, however may still be clinically interesting.
In the Decipher system, the transcript for an SNV is required. This means we can compute the consequence (via the Ensembl API).
I'm unclear as to what the problem is that this issue is trying to solve. If it's purely informational, that's OK, but I'm wondering if there's something deeper I'm missing. Wouldn't option 2 be less effective for data transfer rather than more effective, as we would be essentially de-normalising the data?
Assuming this is informational, I'd opt for option 1.
Unless there are any objections, will opt for option 1. I'll create some sample json for this and make a pull request
I'm in favour of allowing multiple consequences and or transcirpts transmitted, but I don't understand the value or use case for including the transcript the consequence was aquired from.
Regardless, consequence aka "type" should become an array. I'm also for renaming to "effect" with the sub object name of "consequence".
Thoughts?