Error when the VCF will have a zero position site
Fixes #2838
Note that this is a fairly breaking change that we should think about, given that the default is for msprime output to require the new flag to write_vcf.
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Comparison is base (
1601a5e) 89.72% compared to head (593d5a2) 92.97%.
Additional details and impacted files
@@ Coverage Diff @@
## main #2901 +/- ##
==========================================
+ Coverage 89.72% 92.97% +3.24%
==========================================
Files 30 23 -7
Lines 30296 15937 -14359
Branches 5896 3140 -2756
==========================================
- Hits 27183 14817 -12366
+ Misses 1781 667 -1114
+ Partials 1332 453 -879
| Flag | Coverage Δ | |
|---|---|---|
| c-tests | ? |
|
| lwt-tests | 80.78% <ø> (ø) |
|
| python-c-tests | 67.71% <0.00%> (-0.03%) |
:arrow_down: |
| python-tests | 98.96% <100.00%> (+<0.01%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
| Files | Coverage Δ | |
|---|---|---|
| python/tskit/cli.py | 96.96% <100.00%> (+0.02%) |
:arrow_up: |
| python/tskit/trees.py | 98.74% <100.00%> (+<0.01%) |
:arrow_up: |
| python/tskit/vcf.py | 98.44% <100.00%> (+0.02%) |
:arrow_up: |
... and 7 files with indirect coverage changes
Continue to review full report in Codecov by Sentry.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update 1601a5e...593d5a2. Read the comment docs.
One thought experiment is: what if some app using tskit is already generating all site positions on 1 <= x < seq length + 1?
One thought experiment is: what if some app using tskit is already generating all site positions on 1 <= x < seq length + 1?
That wouldn't be a valid tree sequence as all positions have to be less than sequence length.
That wouldn't be a valid tree sequence as all positions have to be less than sequence length.
Oops -- it would be if I'd written it correctly (w/o the +1). But my point should have been: we have no firm requirement that the minimum position actually used is zero.
I'm not sure I get what you mean - you can have a tree sequence with no sites? Maybe you mean we have no firm specification for how the reference sequence in the tree sequence maps onto the position field. We don't even have a requirement that the ref seq length is equal to sequence_length-1.
I'm not sure I get what you mean - you can have a tree sequence with no sites? Maybe you mean we have no firm specification for how the reference sequence in the tree sequence maps onto the position field. We don't even have a requirement that the ref seq length is equal to
sequence_length-1.
Imagine that someone only considers positions from [10, seqlen). Their "genome" starts at position 10, not 0. That is a valid tree sequence and they can choose their seqlen so that the max allowed site position matches whatever they have in mind, say 100. So they are modeling a gene segment from positions [10, 100] for some reason using a table collection with seqlen of 101.
This is a valid use of the API. What would this PR do to this use case?
I think my questions/confusion are related to some comments in the linked issue: seqlen must be > the max allowed position but the data model is not necessarily zero indexed.
I'll go away now...
This is just checking to see if the first position is zero, nothing else. The seqlen stuff was a digression on the thread
I love it (as much as is possible for a weird VCF hack). Nice solution.