hgvs icon indicating copy to clipboard operation
hgvs copied to clipboard

review causes of failed clinvar tests

Open reece opened this issue 8 years ago • 6 comments

Originally reported by: Reece Hart (Bitbucket: reece, GitHub: reece)


#361 created new tests using ClinVar. These tests cover GRCh37 and GRCh38.

During the select of tests, it was discovered that ~10% the expected outputs did not match those of hgvs. Examples are:

  • AssertionError: c_to_p(NM_005763.3:c.1601_1609delGTAAACAAG): got NP_005754.2:p.Cys534Ter; expected on of NP_005754.2:p.Cys534_Ala871delinsTer deletion through end of sequence is discouraged by recommendations

  • AssertionError: g_to_t(NC_000019.10:g.1047511_1047516delAGCAGG,NM_019112.3): got NM_019112.3:c.2126_2131delAGCAGG; expected NM_019112.3:c.2124_2130del7 delN is deprecated, I think

  • AssertionError: c_to_p(NM_030957.2:c.709C>T): got NP_112219.2:p.Arg237Ter; expected on of NP_112219.3:p.Arg237Ter wrong NM-NP association per NCBI web site

  • AssertionError: c_to_p(NM_000642.2:c.4529dupA): got NP_000633.2:p.Tyr1510Ter; expected on of NP_000633.2:p.Tyr1510TerfsTer TerfsTer?!

  • AssertionError: g_to_t(NC_000001.10:g.94980759G>A,NM_002858.3): got NM_002858.3:c.1902+1G>A; expected NM_002858.3:c.1902_1903insATTTGTATTTCTTTCATTGAATG Perhaps ClinVar is normalizing against genomic sequence?

  • AssertionError: g_to_t(NC_000002.11:g.169780331dupG,NM_003742.2): got NM_003742.2:c.3767dupC; expected NM_003742.2:c.3767_3768insC

  • start or end or both are beyond the bounds of transcript record The transcript variant specifies coordinates beyond the transcript. Oops.

Some of the clinvar variants are clearly wrong (TerTer, for example). However, others reflect (at least) shortcomings of the hgvs package and may be bugs.

The file tests/data/clinvar.gz contains tests commented out with the reasons for each.

The goal for this issue is to triage these errors and, as necessary, create new issues to address problems.


  • Bitbucket: https://bitbucket.org/biocommons/hgvs/issue/380

reece avatar Oct 02 '16 00:10 reece

Update to this. I'm parsing a clinvar file now and got this error:

HGVSParseError: NM_007194.4(CHEK2):c.1135_1136TC[2]: char 32: expected the character '='

I checked the recommendations and the c.1135_1136TC[2] looks ok per https://varnomen.hgvs.org/recommendations/RNA/variant/repeated/

deannachurch avatar Jan 02 '20 23:01 deannachurch

Another ClinVar failure. Parsing this variant: NM_007194.4(CHEK2):c.3G>T

To get the protein variant, hgvs returns this: NP_009125.1:p.Met1?

But, it should be: NP_009125.1:p.Met1Ile (from ClinVar)

deannachurch avatar Jan 03 '20 17:01 deannachurch

Hi @deannachurch

Per past issue discussion that have been had (#566) here and on the VariantValidator project (#86) any mutations in the Start Codon should be marked as M1? as the amino acid change fundamentally disrupts the translation signal. Without a valid Start Codon the effect is unknown.

This question was addressed by the HGVS society in a recent question (https://www.facebook.com/HGVSmutnomen/posts/2430762803629529) as well.

akeeeshi avatar Jan 03 '20 17:01 akeeeshi

Thanks for the update. It would be useful then if the code returned something more useful. When I try to pull information on the protein I get an exception rather than useful information: pvar.posedit.pos.start.base

Is that possible?

deannachurch avatar Jan 03 '20 18:01 deannachurch

Right now, it's not possible. But, it's highly desired.

https://github.com/biocommons/hgvs/issues/333 has an explanation of why things are the way they are.

We can definitely do better here, but it's significant work and has been lower priority than other work.

-Reece

On Fri, Jan 3, 2020 at 10:26 AM Deanna Church [email protected] wrote:

Thanks for the update. It would be useful then if the code returned something more useful. When I try to pull information on the protein I get an exception rather than useful information: pvar.posedit.pos.start.base

Is that possible?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/biocommons/hgvs/issues/380?email_source=notifications&email_token=AAA2XDNXBFSS4XTR57EI64TQ357MNA5CNFSM4KCH53GKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIBYMFA#issuecomment-570656276, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA2XDLKY3N7PLH54OXCIUDQ357MNANCNFSM4KCH53GA .

reece avatar Jan 04 '20 00:01 reece

I get it- and I can work around now. Thanks for the hard, unpaid labor here!

deannachurch avatar Jan 04 '20 00:01 deannachurch

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Feb 26 '24 01:02 github-actions[bot]

This issue was closed because it has been stalled for 7 days with no activity.

github-actions[bot] avatar Mar 06 '24 01:03 github-actions[bot]