Uncertainty - implicit behaviors
Raised new issue so as not to derail the pull request going through.
In a few places, there is implicit behavior (such as taking inner interval) - we should allow the user to explicitly configure this, with sane defaults
@davmlaw : I don't understand the concern or proposal here. Would you please elaborate?
You sometimes need to get an exact interval from an uncertain interval, for instance to retrieve sequences from fasta
By default we currently take the outer if we can, falling back to inner. The only code that uses outer_confidence=False is normalizing dups
It's probably easier to show the current method and the TODO comment:
def get_start_end(
var, outer_confidence=True
) -> tuple[
hgvs.location.SimplePosition | hgvs.location.BaseOffsetPosition,
hgvs.location.SimplePosition | hgvs.location.BaseOffsetPosition,
]:
"""Get start and end positions from a variant or interval.
This function handles all position types (SimplePosition, BaseOffsetPosition,
Interval, BaseOffsetInterval) and returns the appropriate start and end positions.
It can be expected that the returned positions have a base and an uncertain property.
By default we return the outer confidence positions. However, if that position
does not have a base, we return the inner confidence positions.
TODO: add a new optional parameter that allows to define the strictness of the returned positions.
The current behavior is more alike to an "auto" mode, since we might fall back to the inner confidence positions
if the outer confidence positions do not have a base. A potential "strict" mode would only return the outer confidence positions, and raise an error if the outer confidence positions do not have a base.
"""
The implementation would presumably involve the whole global_config etc
I can also see the general utility of a SequenceVariant method as_certain(outer_confidence=True, strict=True) -> SequenceVariant for instance if people want to convert HGVS for other systems that don't support uncertainty (everyone but us!)
We should probably add some explicit tests for get_start_end and get_start_end_interbase (can't see any) given how critical it is