hgvs icon indicating copy to clipboard operation
hgvs copied to clipboard

Uncertainty - implicit behaviors

Open davmlaw opened this issue 3 months ago • 2 comments

Raised new issue so as not to derail the pull request going through.

In a few places, there is implicit behavior (such as taking inner interval) - we should allow the user to explicitly configure this, with sane defaults

davmlaw avatar Sep 05 '25 15:09 davmlaw

@davmlaw : I don't understand the concern or proposal here. Would you please elaborate?

reece avatar Oct 15 '25 16:10 reece

You sometimes need to get an exact interval from an uncertain interval, for instance to retrieve sequences from fasta

By default we currently take the outer if we can, falling back to inner. The only code that uses outer_confidence=False is normalizing dups

It's probably easier to show the current method and the TODO comment:

def get_start_end(
    var, outer_confidence=True
) -> tuple[
    hgvs.location.SimplePosition | hgvs.location.BaseOffsetPosition,
    hgvs.location.SimplePosition | hgvs.location.BaseOffsetPosition,
]:
    """Get start and end positions from a variant or interval.

    This function handles all position types (SimplePosition, BaseOffsetPosition,
    Interval, BaseOffsetInterval) and returns the appropriate start and end positions.
    It can be expected that the returned positions have a base and an uncertain property.

    By default we return the outer confidence positions. However, if that position
    does not have a base, we return the inner confidence positions.

    TODO: add a new optional parameter that allows to define the strictness of the returned positions.
    The current behavior is more alike to an "auto" mode, since we might fall back to the inner confidence positions
    if the outer confidence positions do not have a base. A potential "strict" mode would only return the outer confidence positions, and raise an error if the outer confidence positions do not have a base.
"""

The implementation would presumably involve the whole global_config etc

I can also see the general utility of a SequenceVariant method as_certain(outer_confidence=True, strict=True) -> SequenceVariant for instance if people want to convert HGVS for other systems that don't support uncertainty (everyone but us!)

We should probably add some explicit tests for get_start_end and get_start_end_interbase (can't see any) given how critical it is

davmlaw avatar Oct 16 '25 00:10 davmlaw