hgvs icon indicating copy to clipboard operation
hgvs copied to clipboard

Formatting of extensions does not follow spec

Open schorlton-bugseq opened this issue 2 months ago • 1 comments

Thanks for your work on this important tool!

Running v1.5.6 installed via bioconda:

from hgvs.edit import (
    AAExt,
)
from hgvs.posedit import PosEdit
from hgvs.location import Interval, AAPosition

print(PosEdit(
        pos=Interval(
            start=AAPosition(
                base=1,
                aa="M",
            ),
            end=AAPosition(
                base=1,
                aa="M",
            ),
        ),
        edit=AAExt(
            ref="M",
            aaterm="LGM",
            length=-2
        ),
    )
)

Returns: Met1extLeuGlyMet-2

Based on spec 21.1.3 (https://hgvs-nomenclature.org/stable/recommendations/protein/extension/), a N-terminus extension should be represented as: Met1ext-2

I think they likely changed either the spec or the docs as I see this in Examples:

Image

I also think C-terminus extensions are impacted:

from hgvs.edit import (
    AAExt,
)
from hgvs.posedit import PosEdit
from hgvs.location import Interval, AAPosition

print(PosEdit(
        pos=Interval(
            start=AAPosition(
                base=10,
                aa="*",
            ),
            end=AAPosition(
                base=10,
                aa="*",
            ),
        ),
        edit=AAExt(
            ref="*",
            aaterm="LG*",
            length=2
        ),
    )
)

Returns Ter10extLeuGlyTer2

But should be Ter10LeuextTer2

Thanks again!

schorlton-bugseq avatar Oct 10 '25 14:10 schorlton-bugseq

@schorlton-bugseq Thank you for the lucid bug report and example! I agree that that this is a bug, and suspect that you're correct that this is a spec change. (The recent overhaul of the spec started a migration toward becoming a bit more pedantic about structuring and versioning HGVS using a formal grammar.)

reece avatar Oct 15 '25 17:10 reece