hgvs
hgvs copied to clipboard
review causes of failed clinvar tests
Originally reported by: Reece Hart (Bitbucket: reece, GitHub: reece)
#361 created new tests using ClinVar. These tests cover GRCh37 and GRCh38.
During the select of tests, it was discovered that ~10% the expected outputs did not match those of hgvs
. Examples are:
-
AssertionError: c_to_p(NM_005763.3:c.1601_1609delGTAAACAAG): got NP_005754.2:p.Cys534Ter; expected on of NP_005754.2:p.Cys534_Ala871delinsTer
deletion through end of sequence is discouraged by recommendations -
AssertionError: g_to_t(NC_000019.10:g.1047511_1047516delAGCAGG,NM_019112.3): got NM_019112.3:c.2126_2131delAGCAGG; expected NM_019112.3:c.2124_2130del7
delN is deprecated, I think -
AssertionError: c_to_p(NM_030957.2:c.709C>T): got NP_112219.2:p.Arg237Ter; expected on of NP_112219.3:p.Arg237Ter
wrong NM-NP association per NCBI web site -
AssertionError: c_to_p(NM_000642.2:c.4529dupA): got NP_000633.2:p.Tyr1510Ter; expected on of NP_000633.2:p.Tyr1510TerfsTer
TerfsTer?! -
AssertionError: g_to_t(NC_000001.10:g.94980759G>A,NM_002858.3): got NM_002858.3:c.1902+1G>A; expected NM_002858.3:c.1902_1903insATTTGTATTTCTTTCATTGAATG
Perhaps ClinVar is normalizing against genomic sequence? -
AssertionError: g_to_t(NC_000002.11:g.169780331dupG,NM_003742.2): got NM_003742.2:c.3767dupC; expected NM_003742.2:c.3767_3768insC
-
start or end or both are beyond the bounds of transcript record
The transcript variant specifies coordinates beyond the transcript. Oops.
Some of the clinvar variants are clearly wrong (TerTer, for example). However, others reflect (at least) shortcomings of the hgvs package and may be bugs.
The file tests/data/clinvar.gz contains tests commented out with the reasons for each.
The goal for this issue is to triage these errors and, as necessary, create new issues to address problems.
- Bitbucket: https://bitbucket.org/biocommons/hgvs/issue/380
Update to this. I'm parsing a clinvar file now and got this error:
HGVSParseError: NM_007194.4(CHEK2):c.1135_1136TC[2]: char 32: expected the character '='
I checked the recommendations and the c.1135_1136TC[2] looks ok per https://varnomen.hgvs.org/recommendations/RNA/variant/repeated/
Another ClinVar failure. Parsing this variant: NM_007194.4(CHEK2):c.3G>T
To get the protein variant, hgvs returns this: NP_009125.1:p.Met1?
But, it should be: NP_009125.1:p.Met1Ile (from ClinVar)
Hi @deannachurch
Per past issue discussion that have been had (#566) here and on the VariantValidator project (#86) any mutations in the Start Codon should be marked as M1? as the amino acid change fundamentally disrupts the translation signal. Without a valid Start Codon the effect is unknown.
This question was addressed by the HGVS society in a recent question (https://www.facebook.com/HGVSmutnomen/posts/2430762803629529) as well.
Thanks for the update. It would be useful then if the code returned something more useful. When I try to pull information on the protein I get an exception rather than useful information: pvar.posedit.pos.start.base
Is that possible?
Right now, it's not possible. But, it's highly desired.
https://github.com/biocommons/hgvs/issues/333 has an explanation of why things are the way they are.
We can definitely do better here, but it's significant work and has been lower priority than other work.
-Reece
On Fri, Jan 3, 2020 at 10:26 AM Deanna Church [email protected] wrote:
Thanks for the update. It would be useful then if the code returned something more useful. When I try to pull information on the protein I get an exception rather than useful information: pvar.posedit.pos.start.base
Is that possible?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/biocommons/hgvs/issues/380?email_source=notifications&email_token=AAA2XDNXBFSS4XTR57EI64TQ357MNA5CNFSM4KCH53GKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIBYMFA#issuecomment-570656276, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA2XDLKY3N7PLH54OXCIUDQ357MNANCNFSM4KCH53GA .
I get it- and I can work around now. Thanks for the hard, unpaid labor here!
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been stalled for 7 days with no activity.