tsinfer
tsinfer copied to clipboard
Add a sample_id column to ancestors
Starts to address #604. Also changes the sampledata format to 3.1. We don't actually do anything with the stored IDs yet, though!
Codecov Report
Merging #607 (78d15c1) into main (69bb8c7) will not change coverage. The diff coverage is
100.00%.
@@ Coverage Diff @@
## main #607 +/- ##
=======================================
Coverage 93.11% 93.11%
=======================================
Files 17 17
Lines 5155 5155
Branches 959 959
=======================================
Hits 4800 4800
Misses 235 235
Partials 120 120
| Flag | Coverage Δ | |
|---|---|---|
| C | 93.11% <100.00%> (ø) |
|
| python | 96.33% <100.00%> (ø) |
Flags with carried forward coverage won't be shown. Click here to find out more.
| Impacted Files | Coverage Δ | |
|---|---|---|
| tsinfer/formats.py | 98.44% <100.00%> (ø) |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
While we in tsinfer land, do you think we could merge this too @jeromekelleher ? What's the deal with the format spec ID changes - do we make point changes or make a full integer change?
We don't want to invalidate all existing AncestorData files - needs to take into account this column potentially being missing and needs to test for that (== complicated).
Is there a direct downstream need for this?
If we are making backwards-incompatible changes, I think this is a relatively simple addition to make, and will enable matching to actual samples, which is useful e.g. for viral trees. Personally I think it worth including in https://github.com/tskit-dev/tsinfer/milestone/3 but happy to chat about it further.
After discussion we decided to look at this after investigating the option of adding a samples argument to match_ancestors, which includes the required samples at the appropriate time.