hgvs icon indicating copy to clipboard operation
hgvs copied to clipboard

In-frame del at the end of a transcript throws Index Error

Open sptaylor opened this issue 6 years ago • 9 comments

c_to_p fails for variant "NM_198180.2:c.408_410delGTG". This is in-frame deletion that involves the termination codon, so I believe it is similar to issues #476, #449 and #525 image

To reproduce in hgvs-shell:

In [1]: v = hp.parse_hgvs_variant("NM_198180.2:c.408_410delGTG")

In [2]: am37.c_to_p(v)
INFO:biocommons.seqrepo.fastadir.fastadir:Opening for reading: /usr/local/share/seqrepo/2017-10-26/sequences/2016/0824/050145/1472014905.920233.fa.bgz
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/usr/local/lib/python2.7/site-packages/hgvs/shell.pyc in <module>()
----> 1 am37.c_to_p(v)

/usr/local/lib/python2.7/site-packages/hgvs/assemblymapper.pyc in c_to_p(self, var_c)
    115
    116     def c_to_p(self, var_c):
--> 117         var_out = super(AssemblyMapper, self).c_to_p(var_c)
    118         return self._maybe_normalize(var_out)
    119

/usr/local/lib/python2.7/site-packages/hgvs/variantmapper.pyc in c_to_p(self, var_c, pro_ac)
    357         for alt_data in all_alt_data:
    358             builder = altseq_to_hgvsp.AltSeqToHgvsp(reference_data, alt_data)
--> 359             var_p = builder.build_hgvsp()
    360             var_ps.append(var_p)
    361

/usr/local/lib/python2.7/site-packages/hgvs/utils/altseq_to_hgvsp.pyc in build_hgvsp(self)
     93                     start = self._alt_data.variant_start_aa - 1
     94                     delta = len(self._alt_seq) - len(self._ref_seq)
---> 95                     while self._ref_seq[start] == self._alt_seq[start]:
     96                         start += 1
     97                     offset = start + abs(delta)

IndexError: string index out of range

sptaylor avatar Oct 22 '18 18:10 sptaylor

Confirmed on master (hgvs 1.2.5). :-( I agree that this is likely related to the issues you saw previously.

reece avatar Oct 23 '18 15:10 reece

I'm going to take a stab at this. I might have a followup question for you.

sptaylor avatar Oct 23 '18 16:10 sptaylor

Hi Reece and Paige,

I have been taking a look at this error. Sorry if you have moved on, but it’s particularly interesting.

Other than giving a warning, I’m not sure what you will be able to do with the c_to_p translation for this variant. The final few bases of the reference sequence look like this

WT GAAGAAAGGCGGCTTCAGCTTCCGCTTCGGTCGGCGGTGA ALT GAAGAAAGGCGGCTTCAGCTTCCGCTTCGGTCGGCG---A

There is no 3 prime UTR nor is there a specific poly_A signal annotated in the record.

The index error is being caused due to there being an insufficient number of bases to complete a codon i.e. the final coon of the variant sequence would be GA- (where – is an absent base). Therefore, I don’t think the error is related to #476https://github.com/biocommons/hgvs/issues/476, #449https://github.com/biocommons/hgvs/issues/449 and #525https://github.com/biocommons/hgvs/issues/525.

I’m also not sure what the p. prediction would be. Generally, we lost the termination codon, we would in most instances expect to encounter a termination codon in the 3 prime UTR e.g. NM_000517.4:c.427T>C would map to NP_000508.1:p.(Ter143GlnextTer31). However, because we cannot create a final codon due to a short reference sequence, this format will not work. Perhaps p.(Ter?) could work, but I think p.? would be the safest outcome because at face value based on the transcript reference sequence, we would likely predict Non_STOP_DECAY and no protein.

What do you think?

Cheers

Pete

From: Paige [email protected] Reply-To: biocommons/hgvs [email protected] Date: Monday, 22 October 2018 at 19:41 To: biocommons/hgvs [email protected] Cc: Subscribed [email protected] Subject: [biocommons/hgvs] In-frame del at the end of a transcript throws Index Error (#529)

c_to_p fails for variant "NM_198180.2:c.408_410delGTG". This is in-frame deletion that involves the termination codon, so I believe it is similar to issues #476https://github.com/biocommons/hgvs/issues/476, #449https://github.com/biocommons/hgvs/issues/449 and #525https://github.com/biocommons/hgvs/issues/525 [Image removed by sender. image]https://user-images.githubusercontent.com/909335/47311635-276aa280-d5ef-11e8-9686-dd537e187112.png

To reproduce in hgvs-shell:

In [1]: v = hp.parse_hgvs_variant("NM_198180.2:c.408_410delGTG")

In [2]: am37.c_to_p(v)

INFO:biocommons.seqrepo.fastadir.fastadir:Opening for reading: /usr/local/share/seqrepo/2017-10-26/sequences/2016/0824/050145/1472014905.920233.fa.bgz


IndexError Traceback (most recent call last)

/usr/local/lib/python2.7/site-packages/hgvs/shell.pyc in ()

----> 1 am37.c_to_p(v)

/usr/local/lib/python2.7/site-packages/hgvs/assemblymapper.pyc in c_to_p(self, var_c)

115

116     def c_to_p(self, var_c):

--> 117 var_out = super(AssemblyMapper, self).c_to_p(var_c)

118         return self._maybe_normalize(var_out)

119

/usr/local/lib/python2.7/site-packages/hgvs/variantmapper.pyc in c_to_p(self, var_c, pro_ac)

357         for alt_data in all_alt_data:

358             builder = altseq_to_hgvsp.AltSeqToHgvsp(reference_data, alt_data)

--> 359 var_p = builder.build_hgvsp()

360             var_ps.append(var_p)

361

/usr/local/lib/python2.7/site-packages/hgvs/utils/altseq_to_hgvsp.pyc in build_hgvsp(self)

 93                     start = self._alt_data.variant_start_aa - 1

 94                     delta = len(self._alt_seq) - len(self._ref_seq)

---> 95 while self._ref_seq[start] == self._alt_seq[start]:

 96                         start += 1

 97                     offset = start + abs(delta)

IndexError: string index out of range

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/biocommons/hgvs/issues/529, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AZVoub4O5s4eKJsDHKmRcHnFptn-K8jMks5unhFJgaJpZM4Xz8JK.

Peter-J-Freeman avatar Oct 29 '18 14:10 Peter-J-Freeman

Hey @PeteCausey-Freeman

Thank you for weighing in and providing that detailed explanation, I agree. I originally thought it would be related to those other issues affecting the Ter but after digging in, I realized it was a bit trickier. The absence of a 3'UTR explains it. It's also probably why the protein naming has stumped other tools: an old version of Alamut predicted the protein effect as p.Arg136_137ins34, the new version has it as p.137ext?, and Mutalyzer goes with: p.(=) https://mutalyzer.nl/name-checker?description=NM_198180.2%3Ac.408_410delGTG.

I am tagging @naomifox since she was taking a stab at it. I agree that p.? seems to be the safest way to go.

sptaylor avatar Oct 29 '18 15:10 sptaylor

Great.

Happy to help.

Peter-J-Freeman avatar Oct 29 '18 15:10 Peter-J-Freeman

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Dec 23 '23 01:12 github-actions[bot]

This issue was closed because it has been stalled for 7 days with no activity.

github-actions[bot] avatar Dec 30 '23 01:12 github-actions[bot]

This issue was closed by stalebot. It has been reopened to give more time for community review. See biocommons coding guidelines for stale issue and pull request policies. This resurrection is expected to be a one-time event.

reece avatar Feb 19 '24 17:02 reece