phenopacket-format
phenopacket-format copied to clipboard
Variant representation
We should discuss how to best represent variants. Probably we need something flexible like
Apologies, the commit above appears to be unrelated
This is what we have as an example:
schema: phenopacket-level-1
comment: This is an example phenopacket containing one variant to phenotype association
ontologies:
- id: hp
version: "2016-02-01"
variants:
- id: _:v1
positions:
- type: HGVS
value: "NM_123:c.-123C>T"
phenotype_profile:
- entity: _:v1
evidence:
type: TAS
source:
id: PMID:FAKE1234
title: Mutations in NM_123 cause multisystem proteinopathy and ALS
phenotype:
type:
id: HP:0003560
label: Muscular dystrophy
onset:
type:
id: HP:0003584
label: Late onset
description: blah blah
created: 2016-01-14
contributors:
- id: ORCID:nnnn-nnnn-nnnn
on the one hand this is scope creep. On the other hand this is practically v useful. The approach is to be modular. The variant part is separable, can be represented outside and referenced, or can be embedded in. Same approach for ped.
Can someone take a shot at making some fake examples, we will derive the model from this
@cmungall @pnrobinson @jmcmurry: Why are we not adapting the MME schema for variants? It is fairly comprehensive and would enable PXF to be aligned with it. If you agree, I can have a first stab at implementing it.
Can you have a go at a PR on the reference implementation?
There is also the main GA4GH variant representation. But why don't you take a first pass at a PR on the reference implementaion?
On 5 Apr 2016, at 21:18, tudorgroza wrote:
@cmungall @pnrobinson @jmcmurry: Why are we not adapting the MME schema for variants? It is fairly comprehensive and would enable PXF to be aligned with it. If you agree, I can have a first stab at implementing it.
You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/phenopackets/phenopacket-format/issues/10#issuecomment-206112165
Tudor and I just discussed this. I would suggest that we design the format to be easily extensible to other lab abnormalities - say a paper about a protein biomarker and some disease. Or ISCN, glycomics, and metabolomics. Might be a lot for v1 cheers Peter
Dr. med. Peter N. Robinson, MSc. Professor of Medical Genomics Professor of Bioinformatics, Freie Universität Berlin Institut für Medizinische Genetik und Humangenetik Charité - Universitätsmedizin Berlin Augustenburger Platz 1 13353 Berlin Germany +4930 450566006 Mobile: 0160 93769872 [email protected] http://compbio.charite.de http://www.human-phenotype-ontology.org I have learned from my mistakes, and I am sure I can repeat them exactly ORCID ID:http://orcid.org/0000-0002-0736-9199 Scopus Author ID 7403719646 Appointment request: http://doodle.com/pnrobinson
Von: Chris Mungall [[email protected]] Gesendet: Mittwoch, 6. April 2016 07:12 An: phenopackets/phenopacket-format Cc: Robinson, Peter Betreff: Re: [phenopackets/phenopacket-format] Variant representation (#10)
Can you have a go at a PR on the reference implementation?
There is also the main GA4GH variant representation. But why don't you take a first pass at a PR on the reference implementaion?
On 5 Apr 2016, at 21:18, tudorgroza wrote:
@cmungall @pnrobinson @jmcmurry: Why are we not adapting the MME schema for variants? It is fairly comprehensive and would enable PXF to be aligned with it. If you agree, I can have a first stab at implementing it.
You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/phenopackets/phenopacket-format/issues/10#issuecomment-206112165
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHubhttps://github.com/phenopackets/phenopacket-format/issues/10#issuecomment-206120620
On 5 Apr 2016, at 22:27, Peter Robinson wrote:
Tudor and I just discussed this. I would suggest that we design the format to be easily extensible to other lab abnormalities - say a paper about a protein biomarker and some disease. Or ISCN, glycomics, and metabolomics. Might be a lot for v1
I'm not totally following the relevance to this ticket (other than ISCN).
Just a clarifying note about versions and levels. These are in theory orthogonal. Think OWL profiles and OWL versions, or GO-vs-GO-slims and GO versions. Version updates will be about clarifying semantics, improvements not related to expressivity, etc. Should stabilize a bit after v1. Levels are more like profiles or subsets.
Having said that since we switched to JSON-schema everything is rolled into the same level. It's actually easier to make the more complete model and then think about the kinds of profiles we would derive from it. It's also likely that we won't be able to capture everything in v1, and some of the higher level stuff will appear in future versions. But just a cautionary note on equating versions with expressivity and flexibility.
Let's capture some of these requirements e.g. glycomics in separate tickets.
@cmungall : Ok. Can you please have a look at the current PR I've put in?
Thanks!
So Association was originally conceived of as an association between a thing like a person, disease, variant and an ontological description of that thing, Of course it makes perfect sense to genericise this somewhat for person-variant associations, but I'll need to think to make sure that no assumptions are broken. But this can happen later.
On 5 Apr 2016, at 23:23, tudorgroza wrote:
@cmungall : Ok. Can you please have a look at the current PR I've put in?
You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/phenopackets/phenopacket-format/issues/10#issuecomment-206138355
Thanks. I'll add it to PA and see what other things are missing.
I think we shoul leave out the HGVS description - it only applies to humans and we want this to be more generic than that.
Also I think we should follow the GA4GH variant schema more closely. The MME one is pretty closely aligned to this anyway. We'll only be able to capture SNPs and indels, but that's the current state of things.
We also need the ability to link out to other sources, e.g VCF files. Probably a simple uri will suffice?
On 6 May 2016, at 3:51, Jules Jacobsen wrote:
Also I think we should follow the GA4GH variant schema more closely. The MME one is pretty closely aligned to this anyway. We'll only be able to capture SNPs and indels, but that's the current state of things.
That's fine - we will use a genotype object for other scenarios