Uncertain ranges
Haven't added for c. yet. The hgvs.pymeta changes are likely similar but will need to add tests.
Can use the expressions from here: https://github.com/biocommons/hgvs/issues/225
@biocommons/hgvs-maintainers Would we be able to get feedback on this PR?
This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
It would be great to get this PR in. Can we just make sure the "uncertain" field on BaseOffsetPosition or SimplePosition gets set correctly (and tested in the unit test)?
If we express a range in parenthesis, I'd assume we want to express that the whole range is uncertain. At least that's how I'd read the hgvs spec.
Thanks!
Playing around with this branch some more I realized that g_to_c projection was not yet working. As such I had a go at enabling this. Since the behavior of this projection is somewhat undefined, the current approach picks the inner (=confident) interval of the uncertain range and uses that for the .c projection. The resulting hgvs_c string is then "confident" and not "uncertain" any longer.
This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This PR was closed because it has been stalled for 7 days with no activity.
Can we work on this one on Monday during our dev session?
As discussed on Monday, here a screenshot of NC_000009.11:g.108337304_108337428del. If we would represent this as a closely related, but imprecise event NC_000009.11:g.(?_108337304)_(108337428_?)del , how would we like the hgvs_c to be?
Would we want g_to_c to resul in NM_001079802.1:c.-10_105+10del for both the precise as well as imprecise event? If we were to map this to NM_001079802.1:c.(?_-10)_(105+10_?)del, do we want this to be comparable with the precise form in some form?
Can we represent partially precise observations like NM_001079802.1:c.-10_(105+10_?)del ?
I believe that the c. uncertainty should match the g. uncertainty. That is, my preference is
NC_000009.11:g.(?_108337304)_(108337428_?)del ⇔ NM_001079802.1:c.(?_-10)_(105+10_?)del
In principle, I don't see any problem with a partially uncertain event, although I don't see how it would apply in this case.
Added imprecise hgvs_c strings, but it is not 100% there yet, since
NC_000009.11:g.(?_108337304)_(108337428_?)del ⇔ NM_001079802.1:c.(-10)_(105+10)del at the moment. We want this to be NM_001079802.1:c.(?_-10)_(105+10_?)del. The problem is that the -10 comes out as a BaseOffsetPosition currently, when we prob would want it an Interval where the start has no "base" set. That prob. requires some more modifications on g_to_n in the alignmentmapper.
(see otherwise working cases in test_uncertain_projection_confidence).
Regarding partially precise events that were mentioned earlier: In the example above, there could be a slightly different event, where one might observe that there is indeed a precise breakpoint in the UTR on the left, but an unknown breakpoint in the UTR region downstream of the exon. It would prob be something like this: (uncertainty only on the right side) NC_000009.11:g.108337304_(108337428_?)del .
Thanks for working this issue, @andreasprlic.
I converted this PR to a draft to make it clear that it was a WIP.
@andreasprlic thank you for working on this. This is a fairly important issue for vrs-Python and the various datasets we are trying to transform from hgvs to vrs. Do you have any sense of when this work will be completed or if it could be handed off to another developer efficiently? No pressure but any information will assist our planning.
@larrybabb Thanks for calling this out. So far we have been treating this one as a "nice to have", not a "must have".
At this point I believe g_to_c for imprecise events is working. Right now I am looking into the other direction, c_to_g, which will require some "grammar" modifications in how to parse the hgvs-strings. Perhaps it is best if I push what I got so far up, then somebody else who has worked on the parser previously could try to get imprecise hgvs_c parsing to work? @theferrit32 perhaps? :-)
I pushed my current version of the code.
Note:
- unit tests for imprecise g_to_c are passing.
- unit test for partially precise g_to_c is passing too.
- not passing yet: imprecise c_to_g.
@larrybabb for your use case, do you need the support in the direction g_to_c, or do you also need c_to_g? I do wonder if the vast majority of use cases are addressed by this PR as it is, and the missing parsing of imprecise c_to_g would be just useful for rare edge cases (and could be done via a future PR).
... for your use case, do you need the support in the direction
g_to_c, or do you also needc_to_g?
@andreasprlic at this point in time I do not need either the g_to_c or the c_to_g support. But if I look ahead a bit I think I might like to have the g_to_c support down the road. I do not foresee needing the c_to_g on these ambiguous endpoints.
@larrybabb Thanks, then I believe this PR is ready to review.
This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.