hgvs
hgvs copied to clipboard
In-frame del at the end of a transcript throws Index Error
c_to_p fails for variant "NM_198180.2:c.408_410delGTG".
This is in-frame deletion that involves the termination codon, so I believe it is similar to issues #476, #449 and #525
To reproduce in hgvs-shell:
In [1]: v = hp.parse_hgvs_variant("NM_198180.2:c.408_410delGTG")
In [2]: am37.c_to_p(v)
INFO:biocommons.seqrepo.fastadir.fastadir:Opening for reading: /usr/local/share/seqrepo/2017-10-26/sequences/2016/0824/050145/1472014905.920233.fa.bgz
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
/usr/local/lib/python2.7/site-packages/hgvs/shell.pyc in <module>()
----> 1 am37.c_to_p(v)
/usr/local/lib/python2.7/site-packages/hgvs/assemblymapper.pyc in c_to_p(self, var_c)
115
116 def c_to_p(self, var_c):
--> 117 var_out = super(AssemblyMapper, self).c_to_p(var_c)
118 return self._maybe_normalize(var_out)
119
/usr/local/lib/python2.7/site-packages/hgvs/variantmapper.pyc in c_to_p(self, var_c, pro_ac)
357 for alt_data in all_alt_data:
358 builder = altseq_to_hgvsp.AltSeqToHgvsp(reference_data, alt_data)
--> 359 var_p = builder.build_hgvsp()
360 var_ps.append(var_p)
361
/usr/local/lib/python2.7/site-packages/hgvs/utils/altseq_to_hgvsp.pyc in build_hgvsp(self)
93 start = self._alt_data.variant_start_aa - 1
94 delta = len(self._alt_seq) - len(self._ref_seq)
---> 95 while self._ref_seq[start] == self._alt_seq[start]:
96 start += 1
97 offset = start + abs(delta)
IndexError: string index out of range
Confirmed on master (hgvs 1.2.5). :-( I agree that this is likely related to the issues you saw previously.
I'm going to take a stab at this. I might have a followup question for you.
Hi Reece and Paige,
I have been taking a look at this error. Sorry if you have moved on, but it’s particularly interesting.
Other than giving a warning, I’m not sure what you will be able to do with the c_to_p translation for this variant. The final few bases of the reference sequence look like this
WT GAAGAAAGGCGGCTTCAGCTTCCGCTTCGGTCGGCGGTGA ALT GAAGAAAGGCGGCTTCAGCTTCCGCTTCGGTCGGCG---A
There is no 3 prime UTR nor is there a specific poly_A signal annotated in the record.
The index error is being caused due to there being an insufficient number of bases to complete a codon i.e. the final coon of the variant sequence would be GA- (where – is an absent base). Therefore, I don’t think the error is related to #476https://github.com/biocommons/hgvs/issues/476, #449https://github.com/biocommons/hgvs/issues/449 and #525https://github.com/biocommons/hgvs/issues/525.
I’m also not sure what the p. prediction would be. Generally, we lost the termination codon, we would in most instances expect to encounter a termination codon in the 3 prime UTR e.g. NM_000517.4:c.427T>C would map to NP_000508.1:p.(Ter143GlnextTer31). However, because we cannot create a final codon due to a short reference sequence, this format will not work. Perhaps p.(Ter
What do you think?
Cheers
Pete
From: Paige [email protected] Reply-To: biocommons/hgvs [email protected] Date: Monday, 22 October 2018 at 19:41 To: biocommons/hgvs [email protected] Cc: Subscribed [email protected] Subject: [biocommons/hgvs] In-frame del at the end of a transcript throws Index Error (#529)
c_to_p fails for variant "NM_198180.2:c.408_410delGTG". This is in-frame deletion that involves the termination codon, so I believe it is similar to issues #476https://github.com/biocommons/hgvs/issues/476, #449https://github.com/biocommons/hgvs/issues/449 and #525https://github.com/biocommons/hgvs/issues/525 [Image removed by sender. image]https://user-images.githubusercontent.com/909335/47311635-276aa280-d5ef-11e8-9686-dd537e187112.png
To reproduce in hgvs-shell:
In [1]: v = hp.parse_hgvs_variant("NM_198180.2:c.408_410delGTG")
In [2]: am37.c_to_p(v)
INFO:biocommons.seqrepo.fastadir.fastadir:Opening for reading: /usr/local/share/seqrepo/2017-10-26/sequences/2016/0824/050145/1472014905.920233.fa.bgz
IndexError Traceback (most recent call last)
/usr/local/lib/python2.7/site-packages/hgvs/shell.pyc in
----> 1 am37.c_to_p(v)
/usr/local/lib/python2.7/site-packages/hgvs/assemblymapper.pyc in c_to_p(self, var_c)
115
116 def c_to_p(self, var_c):
--> 117 var_out = super(AssemblyMapper, self).c_to_p(var_c)
118 return self._maybe_normalize(var_out)
119
/usr/local/lib/python2.7/site-packages/hgvs/variantmapper.pyc in c_to_p(self, var_c, pro_ac)
357 for alt_data in all_alt_data:
358 builder = altseq_to_hgvsp.AltSeqToHgvsp(reference_data, alt_data)
--> 359 var_p = builder.build_hgvsp()
360 var_ps.append(var_p)
361
/usr/local/lib/python2.7/site-packages/hgvs/utils/altseq_to_hgvsp.pyc in build_hgvsp(self)
93 start = self._alt_data.variant_start_aa - 1
94 delta = len(self._alt_seq) - len(self._ref_seq)
---> 95 while self._ref_seq[start] == self._alt_seq[start]:
96 start += 1
97 offset = start + abs(delta)
IndexError: string index out of range
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/biocommons/hgvs/issues/529, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AZVoub4O5s4eKJsDHKmRcHnFptn-K8jMks5unhFJgaJpZM4Xz8JK.
Hey @PeteCausey-Freeman
Thank you for weighing in and providing that detailed explanation, I agree. I originally thought it would be related to those other issues affecting the Ter but after digging in, I realized it was a bit trickier. The absence of a 3'UTR explains it. It's also probably why the protein naming has stumped other tools: an old version of Alamut predicted the protein effect as p.Arg136_137ins34, the new version has it as p.137ext?, and Mutalyzer goes with: p.(=) https://mutalyzer.nl/name-checker?description=NM_198180.2%3Ac.408_410delGTG.
I am tagging @naomifox since she was taking a stab at it. I agree that p.? seems to be the safest way to go.
Great.
Happy to help.
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been stalled for 7 days with no activity.
This issue was closed by stalebot. It has been reopened to give more time for community review. See biocommons coding guidelines for stale issue and pull request policies. This resurrection is expected to be a one-time event.