Formatting of extensions does not follow spec
Thanks for your work on this important tool!
Running v1.5.6 installed via bioconda:
from hgvs.edit import (
AAExt,
)
from hgvs.posedit import PosEdit
from hgvs.location import Interval, AAPosition
print(PosEdit(
pos=Interval(
start=AAPosition(
base=1,
aa="M",
),
end=AAPosition(
base=1,
aa="M",
),
),
edit=AAExt(
ref="M",
aaterm="LGM",
length=-2
),
)
)
Returns: Met1extLeuGlyMet-2
Based on spec 21.1.3 (https://hgvs-nomenclature.org/stable/recommendations/protein/extension/), a N-terminus extension should be represented as:
Met1ext-2
I think they likely changed either the spec or the docs as I see this in Examples:
I also think C-terminus extensions are impacted:
from hgvs.edit import (
AAExt,
)
from hgvs.posedit import PosEdit
from hgvs.location import Interval, AAPosition
print(PosEdit(
pos=Interval(
start=AAPosition(
base=10,
aa="*",
),
end=AAPosition(
base=10,
aa="*",
),
),
edit=AAExt(
ref="*",
aaterm="LG*",
length=2
),
)
)
Returns Ter10extLeuGlyTer2
But should be Ter10LeuextTer2
Thanks again!
@schorlton-bugseq Thank you for the lucid bug report and example! I agree that that this is a bug, and suspect that you're correct that this is a spec change. (The recent overhaul of the spec started a migration toward becoming a bit more pedantic about structuring and versioning HGVS using a formal grammar.)