tsinfer
tsinfer copied to clipboard
Experimental hmm
Just opening for visibility and discussion here, not for merging. I've based the changes on 0.3.3 for simplicity here as sc2ts isn't compatible with some of the internal changes in tsinfer (yet). I'll cherry pick these changes onto another PR if we decide to implement this for real (probably adding an option to rescale or not by n)
A quick implementation of the changes discussed in https://github.com/jeromekelleher/sc2ts/issues/242 where we want to simplify the HMM and make it easier to reason about which Viterbi paths we care about.
Note: the only test that's failing is one that's poking into the details of parameter values. Everything else is passing fine.
Some quick notes here on using c11 atomics - getting things to compile on Unix platforms is easy, but Windows is problematic. It seems that MSVC doesn't support stdatomic.h, so I've hacked around it by just not making these variable in question an atomic. In the short term this is fine as we're only using this in sc2ts and this doesn't support Windows. It would raise some questions about whether putting this into tskit is worth the hassle, though.
I don't know anything about atomics, but https://devblogs.microsoft.com/cppblog/c11-atomics-in-visual-studio-2022-version-17-5-preview-2/ suggests that stdatomic.h is coming to windows, and can be enabled in the current MSVC using the /experimental:c11atomics switch?
Good to know. I guess that'll filter down to Python builds sometime around 3.15 or so (Python builds against specific MSVC, 14.x here for what we're working with).
Codecov Report
:x: Patch coverage is 80.86957% with 22 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 93.26%. Comparing base (8bd218d) to head (7522bcc).
Additional details and impacted files
@@ Coverage Diff @@
## 0.3.3-backport #959 +/- ##
==================================================
- Coverage 93.56% 93.26% -0.30%
==================================================
Files 17 17
Lines 5546 5628 +82
Branches 1007 987 -20
==================================================
+ Hits 5189 5249 +60
- Misses 235 248 +13
- Partials 122 131 +9
| Flag | Coverage Δ | |
|---|---|---|
| C | 93.26% <80.86%> (-0.30%) |
:arrow_down: |
| python | 96.57% <80.00%> (-0.08%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
@hyanwong @szhan @benjeffery - one thing to note here is that the last couple of commits remove the low-level requirement that the ancestral state must be 0. So we should definitely port this much into the mainline tsinfer at some point, as it has been a persistent headache.
Perhaps worth doing for the next release, as it may simplify some aspects of the VCF Zarr processing?
Very nice! I assume you mean just bringing across only the code that removes the requirement?
I think this is something we need to merge in and release as part of tsinfer soon. I think we can expose the core functionality (do you divide by n in the HMM) as a "private" API that's used by sc2ts, and keep the standard tsinfer behaviour as it is by default. There's probably some exploration to be done anyway on what effect this really has on the HMM that @duncanMR will be looking in a few months.
I believe all this is now merged in other PRs.
Going to leave the branch here for a while.