Projecting indels across gap causes length change and non-reversibility
I understand a variant growing bigger if the destination reference has an insertion, but shouldn't it be put back when it goes the other way?
original_hgvs = "NM_015120.4(ALMS1):c.36_38dupGGA"
def print_hgvs(sv):
length = sv.posedit.pos.end - sv.posedit.pos.start
print(f"hgvs='{sv}' - {length=}")
var_c = parse(original_hgvs)
print_hgvs(var_c)
var_g = c_to_g(var_c)
print_hgvs(var_g)
var_c2 = g_to_c(var_g, var_c.ac)
print_hgvs(var_c2)
Output:
hgvs='NM_015120.4(ALMS1):c.36_38dup' - length=2
hgvs='NC_000002.12:g.73385937_73385942dup' - length=5
hgvs='NM_015120.4:c.72_77dup' - length=5
Normlization?
I noticed that if you normalize this 1st, the problem goes away.
I think this is because normalization shifts the variant away from the gap. But this shouldn't matter? If you do need to normalize before projection then perhaps we should automatically do this or raise a warning or error if not normalized?
var_c_orig = parse(original_hgvs)
var_c = normalize(var_c_orig)
print(f"Normalized: {var_c_orig} => {var_c}")
print_hgvs(var_c)
var_g = c_to_g(var_c)
print_hgvs(var_g)
var_c2 = g_to_c(var_g, var_c.ac)
print_hgvs(var_c2)
Output:
Normalized: NM_015120.4(ALMS1):c.36_38dup => NM_015120.4:c.75_77dup
hgvs='NM_015120.4:c.75_77dup' - length=2
hgvs='NC_000002.12:g.73385940_73385942dup' - length=2
hgvs='NM_015120.4:c.75_77dup' - length=2
Note - while searching issues I found discussion about alignment gaps (on this transcript!) on #514
To try and remove the normalization issue I made it so big it wouldn't shift, and was able to get it to shift from a dup to an ins:
original_hgvs = "NM_015120.4(ALMS1):c.36_77dup"
var_c_orig = parse(original_hgvs)
var_c = normalize(var_c_orig)
print(f"Normalized: {var_c_orig} => {var_c}")
print_hgvs(var_c)
var_g = c_to_g(var_c)
print_hgvs(var_g)
var_c2 = g_to_c(var_g, var_c.ac)
print_hgvs(var_c2)
Output:
Normalized: NM_015120.4(ALMS1):c.36_77dup => NM_015120.4:c.36_77dup
hgvs='NM_015120.4:c.36_77dup' - length=41
hgvs='NC_000002.12:g.73385942_73385943insGGAGGAGGAGGAGGAGGAGGAGGAGGAGGAGGAGGAGGAGGAGGA'
hgvs='NM_015120.4:c.77_78insGGAGGAGGAGGAGGAGGAGGAGGAGGAGGAGGAGGAGGAGGAGGA'
So yeah I think normalization just hid it before.
I get the change going 1 way, but wondering if the conversion back is wrong, or there should def be a warning here
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.