hgvs icon indicating copy to clipboard operation
hgvs copied to clipboard

Fix ins or dups where splice region is preserved

Open b0d0nne11 opened this issue 1 year ago • 10 comments

Fixes #714.

Fixes ins or dup variants spanning the intron/exon or exon/intron boundary where the splice site & region remain completely intact.

b0d0nne11 avatar Jan 23 '24 19:01 b0d0nne11

We found some examples of duplications where the original logic here didn't shift the variant far enough to to get the expected result. It turned out that for these variants it's not possible to write the shifted version as a duplication since in a duplication the alt will always follow the ref. I've added some logic to rewrite these shifted variants as insertions before attempting to map them back to var_ps and added tests to include these cases.

b0d0nne11 avatar Feb 07 '24 14:02 b0d0nne11

High-level explanation of this pull request:

  • HGVS nomenclature has the 3' shifting rule. so all cdots and pdots are shifted to the right
  • However, 3' shifting is arbitrary and a necessary evil for nomenclature purposes. But biology doesn't care about the 3' shifting rule
  • Consider an example: positive strand gene, first 8 bases of the intron is duplicated
  • cdot would be +1_+8dup
  • currently no pdot would be calculated because both positions have an offset
  • but now consider the biology-- after the duplication, there are 2 splice sites on the left side of the intron. Which is more likely to be used for splicing?
  • We obviously can't know for sure, but it seems to me that the most logical assumption is that it will use the "inner" splice site for splicing, leaving the extra inserted material within the coding sequence, resulting in a frameshift pdot. Why is this the best assumption?
    • When that splice site is used, the entire intronic sequence is totally intact
    • Seems "safer" to calculate a pdot for this to rescue the variant-- otherwise downstream applications will most likely be filtering out this variant because it's an insertion after the 8th position in the intron
  • To handle this type of situation in the most general way possible, this is the approach:
    • calculate pdot the normal way.
    • if empty, shift the cdot in the REVERSE direction, and calculate the pdot again. if you get a result, use it
    • basically, if you can get a pdot with either forward or reverse shifting, that means the entire intron is intact and we should bring in the pdot
  • note that the 3' shifting rule is still respected both for the cdot and pdot nomenclature
    • reverse shifting is only used as a tool when calculating pdot from cdot, which does not violate HGVS nomenclature

gostachowiak avatar Feb 07 '24 14:02 gostachowiak

I also wanted to mention that we discovered this because some fraction of FLT3 ITDs currently get missed when using the hgvs package (including one added to the unit tests). So it is a high impact issue.

gostachowiak avatar Feb 08 '24 12:02 gostachowiak

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Mar 11 '24 01:03 github-actions[bot]

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Apr 11 '24 01:04 github-actions[bot]

@reece or @ahwagner can you remove the stale label here? We would still like to get this merged is possible. Thanks!

b0d0nne11 avatar Apr 12 '24 14:04 b0d0nne11

We found a case where trying to map the shifted variant causes an HGVSInvalidVariantError. I've added logic to handle this and a test case. The variant is NM_182758.2:c.2953-31_2953-26dup. As part of the shifting procedure, mapping this to the g type yielded an unexpected transformation to NC_000015.9:g.53815545_53815550delinsC that caused problems with later steps. I'm simply handling the error here since we don't want to consider invalid variants.

b0d0nne11 avatar May 24 '24 16:05 b0d0nne11

Also rebased on main

b0d0nne11 avatar May 24 '24 16:05 b0d0nne11

Rebased on main

b0d0nne11 avatar Jun 04 '24 18:06 b0d0nne11