vrs-python icon indicating copy to clipboard operation
vrs-python copied to clipboard

Add RLE params to VRS-annotated VCFs

Open jsstevenson opened this issue 4 months ago • 4 comments

Currently, the VCF annotator can optionally put VRS start, end, and sequence onto a VCF for all alleles. This is sufficient to reconstruct the object if the state is an LSE, but not if it's an RLE. We should add additional RLE params to enable this.

jsstevenson avatar Aug 14 '25 13:08 jsstevenson

I think the minimum necessary values are:

  • length
  • repeatSubunitLength

https://github.com/ga4gh/vrs/blob/2.0.1/schema/vrs/json/ReferenceLengthExpression

theferrit32 avatar Oct 14 '25 17:10 theferrit32

Proposing length -> VRS_Lengths and repeatSubunitLength -> VRS_RepeatSubunitLengths.

We may run into a situation where the ALT will be an RLE but the REF will not be. (or the inverse?)

In a case like that, following other VCF conventions I'd propose using . for the REF VRS_States and a . for the ALT VRS_RepeatSubunitLengths

REF=ACG
ALT=ACGACG
VRS_States=ACG,.
VRS_Lengths=.,6
VRS_RepeatSubunitLengths=.,3

theferrit32 avatar Oct 14 '25 17:10 theferrit32

@theferrit32 This looks good to me. repeatSubunitLength is an int in the spec.

I don't think we ever intended LengthExpression to be the parent of ReferenceLengthExpression since these are two very different concepts. I think we should also raise the following discussion with @ahwagner .

@ahwagner Should we make the ReferenceLengthExpression.length an int only and drop the Range option? What reasonable example would ever need Range? I think we should also evaluate this question for the LengthExpression.length field. But since it's draft and I'm not sure who is using it yet it may be premature to discuss. I think we should keep things minimal and expand the scope of the datatype once we have real world needs from one or more implementers.

larrybabb avatar Oct 15 '25 14:10 larrybabb

# TODO https://github.com/ga4gh/vrs-python/blob/b9d9887975e327158d0095050c44f370f2919b77/src/ga4gh/vrs/normalize.py#L138-L140

Created https://github.com/ga4gh/vrs-python/issues/587

theferrit32 avatar Oct 16 '25 16:10 theferrit32