hgvs icon indicating copy to clipboard operation
hgvs copied to clipboard

Question about HGVS representation for termination retention

Open akeeeshi opened this issue 5 years ago • 6 comments

The variant below retains the stop codon of the transcript. It produces a posedit at the var_p level that is represented as (Ter659Ter). For any other coding-silent variant the system annotates the variant in the (AA#=) format.

Is this a particular reason for representing the HGVS for p_dot in this fashion for the stop codon? Are there concerns for representing this as (Ter659=)?

Our team would be happy to put out a PR for this if their is consensus on this being desired behavior.


In [7]: hgvs_g = 'NC_000009.11:g.130577961C>T' In [8]: var_g = hp.parse_hgvs_variant(hgvs_g) In [9]: hp = hgvs.parser.Parser() In [10]: hdp = hgvs.dataproviders.uta.connect() In [11]: am = hgvs.assemblymapper.AssemblyMapper(hdp, assembly_name='GRCh37', alt_aln_method='splign', replace_reference=True)

In [12]: transcripts = am.relevant_transcripts(var_g)

In [14]: transcripts Out[14]: ['NM_001278138.1', 'NM_000118.2', 'NM_001114753.2', 'NM_001114753.1', 'NM_000118.3']

In [15]: var_c = am.g_to_c(var_g, 'NM_001114753.2')

In [16]: var_p = am.c_to_p(var_c)

In [17]: var_c Out[17]: SequenceVariant(ac=NM_001114753.2, type=c, posedit=1977G>A)

In [18]: var_p Out[18]: SequenceVariant(ac=NP_001108225.1, type=p, posedit=(Ter659Ter)

akeeeshi avatar Sep 04 '19 14:09 akeeeshi

For background, see #333 and https://github.com/biocommons/hgvs/issues/2#issuecomment-285511419.

So, what's happening here is pretty easy to understand and difficult to fix correctly. As a consequence of the AA grammar (but not the NA grammar) interleaving locations and bases, there are really two places one might store the reference base: in the position and/or in the edit. So, in this case, the location AA is "Ter", the edit ref is "", and the edit alt is "Ter". Because the ref == alt is False, we don't print a "=".

There are a few ways to fix this. By far the best way is to reunify the AARefAlt and NARefAlt models, but that will have significant complexity implications for the grammar. Another good way is to ensure that the AA ref attribute is set (even though it's unused). The easy way is probably to just explicitly check for this case in posedit::format and hack the fix there.

I want to think about this before acting.

reece avatar Sep 06 '19 00:09 reece

Of course Reece, that makers total sense. The RefSeq Protein reference sequence records do not display the Ter (*). Hence the reference is most certainly ''

I agree that certainly for our purposes, a string replace is the best option. I prefer not to touch the hgvs.py models leaving that up to those that fully understand them.

The code snippet that seems to work for me is

                if re.search('Ter\d+Ter', str(hgvs_protein.posedit):
                    posedit = str(hgvs_protein.posedit)
                    posedit = posedit[:-4] + '=)'
                    hgvs_protein.posedit = posedit

I have tested the solution and it seems robust. Also, and most reassuringly, hgvs.py can convert this into the Single letter aa code syntax.

It’s a low tech solution Reece, not sure if it’s any good for you.

I’m still thinking my way around https://github.com/biocommons/hgvs/issues/566

In the situation @akeeeshi mentions, I think Met1= would be correct (I am going to check the reference sequences, but again, struggling to connect). However, I’m not sure it’s safe to output anything other than (Met1?). For example, a delins or inversion could restore/maintain the start codon resulting in a variant Met1=, but may disrupt something else such as splicing. In a non ATG start codon, a variant could create an ATG, but may not be able to initiate translation. I have been thinking about the best way to go, but I keep getting niggles that anything affecting the structure of the translation initiation region would have an unpredictable effect. In this case, it is likely the genome is incorrect and the transcript and protein reference sequences are correct, but we can’t be sure.

What do you think?

Cross-reference to here https://github.com/openvar/variantValidator/issues/87

Peter-J-Freeman avatar Sep 06 '19 07:09 Peter-J-Freeman

Sure, regexp method will work. I considered the same thing as a hack. I might implement that if other solutions are really too onerous.

reece avatar Sep 06 '19 15:09 reece

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Dec 21 '23 01:12 github-actions[bot]

This issue was closed because it has been stalled for 7 days with no activity.

github-actions[bot] avatar Dec 29 '23 01:12 github-actions[bot]

This issue was closed by stalebot. It has been reopened to give more time for community review. See biocommons coding guidelines for stale issue and pull request policies. This resurrection is expected to be a one-time event.

reece avatar Feb 19 '24 17:02 reece