hgvs icon indicating copy to clipboard operation
hgvs copied to clipboard

Uncertain ranges

Open theferrit32 opened this issue 2 years ago • 24 comments

theferrit32 avatar Sep 20 '23 16:09 theferrit32

Haven't added for c. yet. The hgvs.pymeta changes are likely similar but will need to add tests.

Can use the expressions from here: https://github.com/biocommons/hgvs/issues/225

theferrit32 avatar Sep 20 '23 16:09 theferrit32

@biocommons/hgvs-maintainers Would we be able to get feedback on this PR?

korikuzma avatar Nov 09 '23 15:11 korikuzma

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Dec 10 '23 01:12 github-actions[bot]

It would be great to get this PR in. Can we just make sure the "uncertain" field on BaseOffsetPosition or SimplePosition gets set correctly (and tested in the unit test)?

If we express a range in parenthesis, I'd assume we want to express that the whole range is uncertain. At least that's how I'd read the hgvs spec.

Thanks!

andreasprlic avatar Dec 11 '23 16:12 andreasprlic

Playing around with this branch some more I realized that g_to_c projection was not yet working. As such I had a go at enabling this. Since the behavior of this projection is somewhat undefined, the current approach picks the inner (=confident) interval of the uncertain range and uses that for the .c projection. The resulting hgvs_c string is then "confident" and not "uncertain" any longer.

andreasprlic avatar Dec 12 '23 07:12 andreasprlic

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Feb 02 '24 01:02 github-actions[bot]

This PR was closed because it has been stalled for 7 days with no activity.

github-actions[bot] avatar Feb 10 '24 01:02 github-actions[bot]

Can we work on this one on Monday during our dev session?

andreasprlic avatar Feb 10 '24 01:02 andreasprlic

As discussed on Monday, here a screenshot of NC_000009.11:g.108337304_108337428del. If we would represent this as a closely related, but imprecise event NC_000009.11:g.(?_108337304)_(108337428_?)del , how would we like the hgvs_c to be?

Would we want g_to_c to resul in NM_001079802.1:c.-10_105+10del for both the precise as well as imprecise event? If we were to map this to NM_001079802.1:c.(?_-10)_(105+10_?)del, do we want this to be comparable with the precise form in some form?

Can we represent partially precise observations like NM_001079802.1:c.-10_(105+10_?)del ?

Screenshot 2024-02-14 at 4 13 40 PM

andreasprlic avatar Feb 15 '24 00:02 andreasprlic

I believe that the c. uncertainty should match the g. uncertainty. That is, my preference is

NC_000009.11:g.(?_108337304)_(108337428_?)delNM_001079802.1:c.(?_-10)_(105+10_?)del

In principle, I don't see any problem with a partially uncertain event, although I don't see how it would apply in this case.

reece avatar Feb 18 '24 21:02 reece

Added imprecise hgvs_c strings, but it is not 100% there yet, since

NC_000009.11:g.(?_108337304)_(108337428_?)delNM_001079802.1:c.(-10)_(105+10)del at the moment. We want this to be NM_001079802.1:c.(?_-10)_(105+10_?)del. The problem is that the -10 comes out as a BaseOffsetPosition currently, when we prob would want it an Interval where the start has no "base" set. That prob. requires some more modifications on g_to_n in the alignmentmapper.

(see otherwise working cases in test_uncertain_projection_confidence).

Regarding partially precise events that were mentioned earlier: In the example above, there could be a slightly different event, where one might observe that there is indeed a precise breakpoint in the UTR on the left, but an unknown breakpoint in the UTR region downstream of the exon. It would prob be something like this: (uncertainty only on the right side) NC_000009.11:g.108337304_(108337428_?)del .

andreasprlic avatar Feb 20 '24 07:02 andreasprlic

Thanks for working this issue, @andreasprlic.

I converted this PR to a draft to make it clear that it was a WIP.

reece avatar Feb 20 '24 17:02 reece

@andreasprlic thank you for working on this. This is a fairly important issue for vrs-Python and the various datasets we are trying to transform from hgvs to vrs. Do you have any sense of when this work will be completed or if it could be handed off to another developer efficiently? No pressure but any information will assist our planning.

larrybabb avatar Feb 21 '24 15:02 larrybabb

@larrybabb Thanks for calling this out. So far we have been treating this one as a "nice to have", not a "must have".

At this point I believe g_to_c for imprecise events is working. Right now I am looking into the other direction, c_to_g, which will require some "grammar" modifications in how to parse the hgvs-strings. Perhaps it is best if I push what I got so far up, then somebody else who has worked on the parser previously could try to get imprecise hgvs_c parsing to work? @theferrit32 perhaps? :-)

andreasprlic avatar Feb 21 '24 16:02 andreasprlic

I pushed my current version of the code.

Note:

  • unit tests for imprecise g_to_c are passing.
  • unit test for partially precise g_to_c is passing too.
  • not passing yet: imprecise c_to_g.

andreasprlic avatar Feb 21 '24 16:02 andreasprlic

@larrybabb for your use case, do you need the support in the direction g_to_c, or do you also need c_to_g? I do wonder if the vast majority of use cases are addressed by this PR as it is, and the missing parsing of imprecise c_to_g would be just useful for rare edge cases (and could be done via a future PR).

andreasprlic avatar Feb 24 '24 07:02 andreasprlic

... for your use case, do you need the support in the direction g_to_c, or do you also need c_to_g?

@andreasprlic at this point in time I do not need either the g_to_c or the c_to_g support. But if I look ahead a bit I think I might like to have the g_to_c support down the road. I do not foresee needing the c_to_g on these ambiguous endpoints.

larrybabb avatar Mar 04 '24 16:03 larrybabb

@larrybabb Thanks, then I believe this PR is ready to review.

andreasprlic avatar Mar 04 '24 19:03 andreasprlic

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Apr 13 '24 01:04 github-actions[bot]