Added methods to sort and squash segments in the IbdFinder output.
Fixes #2459 -- see the issue for discussion about why this is needed. Draft only at the moment.
PR Checklist:
- [ ] Tests that fully cover new/changed functionality.
- [ ] Documentation including tutorial content if appropriate.
- [ ] Changelogs, if there are API changes.
Codecov Report
Merging #2460 (49ce776) into main (4288e31) will not change coverage. The diff coverage is
n/a.
@@ Coverage Diff @@
## main #2460 +/- ##
=======================================
Coverage 93.43% 93.43%
=======================================
Files 28 28
Lines 27401 27401
Branches 1255 1255
=======================================
Hits 25601 25601
Misses 1766 1766
Partials 34 34
| Flag | Coverage Δ | |
|---|---|---|
| c-tests | 92.24% <ø> (ø) |
|
| lwt-tests | 89.05% <ø> (ø) |
|
| python-c-tests | 71.17% <ø> (ø) |
|
| python-tests | 98.95% <ø> (ø) |
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update 4288e31...49ce776. Read the comment docs.
Hey @hyanwong! Unfortunately, we both made some pretty involved changes to the test_highlevel.py file in the past few days. I think I've fixed up the merge conflict, but I can't figure out how to get my tests to run with the argument that I added into get_example_tree_sequences(). Currently I'm just skipping these tests because I don't fully understand the syntax of your additions -- do you know how to ensure these tests 'see' the argument I've added in?
It was @benjeffery who fixed up get_example_tree_sequences() to run in parallel, cached, so that parameterize worked properly. He might be able to suggest a fix?
I'll take a look now
I think the simplest thing here is to remove custom_max.
I've done this in https://github.com/tskit-dev/tskit/commit/a3aa3da8f2ba991a48967011a230a32cf6fe62cd
Currently, I've set the N numbers to those generated when custom_max was 15. Once all the IBD tests pass we should add in some even numbers again and skip them in the IBD tests? (I'm assuming that you need them to be odd, as all those generated by custom_max were?
Thanks @benjeffery and sorry for the late reply -- I had to take a few days off to deal with some urgent moving-to-the-US things.
Currently, I've set the N numbers to those generated when custom_max was 15. Once all the IBD tests pass we should add in some even numbers again and skip them in the IBD tests? (I'm assuming that you need them to be odd, as all those generated by custom_max were?
I added this argument because some of my changes in this PR make IbdFinder run much more slowly, to the point where they look like they're hanging. (This is likely a Python-specific problem -- as we discuss in #2459, the C code works differently because it uses an AVL tree to do the sorting). But I didn't want to omit the tests entirely, as they cover some useful edge cases, so I wanted to run them with smaller sample sizes than the ones hard-coded in.
I added this argument because some of my changes in this PR make IbdFinder run much more slowly, to the point where they look like they're hanging. (This is likely a Python-specific problem -- as we discuss in #2459, the C code works differently because it uses an AVL tree to do the sorting). But I didn't want to omit the tests entirely, as they cover some useful edge cases, so I wanted to run them with smaller sample sizes than the ones hard-coded in.
We can leave the slow examples in the examples - but skip them in these tests?