tskit
tskit copied to clipboard
Initial sketch of read-only tables implementatation
Initial proof of concept for making the TableCollection returned by ts.tables
read-only. I've only done the plumbing for the NodeTable here, but the extension is pretty clear. The implementation is pretty confusing at the moment, I think it could be simplified quite a bit.
I also had a look at what we could do in terms of returning direct read-only views of the column data. I think we could do this when the tables are in read-only model, so that ts.tables.nodes.time
is an efficient operation. It doesn't look too bad, actually.
While we're breaking stuff, we should also set all column arrays that we return to being read-only, even when they are copies returned when the tables aren't in read-only mode. This will hopefully let us support the (much more difficult) direct setting of table column memory from numpy arrays, at some point.
Any thoughts?
I'm trying to wrap my head round this. So when tables get made from a tree sequence the tables are marked as read only by having a reference to the ts, makes sense.
What about the other way around when TableCollection.tree_sequence
is called? Do we keep that as a copy operation?
What about the other way around when TableCollection.tree_sequence is called? Do we keep that as a copy operation?
For the first pass I think yes. The main thing we're trying to streamline here is:
ts = tskit.load("bigfile.trees")
# analyse ts in lots of ways including using ts.tables.nodes.time and other column references
The creating-a-tree sequence-from-tables operation is much more niche I think.
I guess we could mark the input TableCollection as read-only when we call .tree_sequence()
on it, though.
Great, was thinking that would be the first workflow to optimise, and doing just that one first makes sense.
I guess we could mark the input TableCollection as read-only when we call .tree_sequence() on it, though.
Pretty sure this would break a lot of things and is an unexpected side-effect to make the TC read-only, we'd likely need to make copy be the default and add an argument, or have a separate method?
We should make sure this change fixes #2080 while we're in here doing this type of stuff.
Revisiting this in the light of #2441 and recent updates (and rebasing on them), and it looks like it basically works so we should probably try and push it through. I'll tip away at it as I have time.
This is a significant breaking change so we'd need a major version number bump for it.
Codecov Report
Merging #2057 (c41dca7) into main (4c4ca2c) will decrease coverage by
1.55%
. The diff coverage is73.13%
.
@@ Coverage Diff @@
## main #2057 +/- ##
==========================================
- Coverage 93.44% 91.88% -1.56%
==========================================
Files 28 28
Lines 27380 27071 -309
Branches 1253 1253
==========================================
- Hits 25584 24875 -709
- Misses 1762 2162 +400
Partials 34 34
Flag | Coverage Δ | |
---|---|---|
c-tests | 92.26% <ø> (ø) |
|
lwt-tests | 89.05% <ø> (ø) |
|
python-c-tests | 67.74% <73.13%> (-3.46%) |
:arrow_down: |
python-tests | 98.95% <100.00%> (ø) |
Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files | Coverage Δ | |
---|---|---|
python/_tskitmodule.c | 84.46% <72.72%> (-6.55%) |
:arrow_down: |
python/tskit/trees.py | 98.72% <100.00%> (ø) |
|
python/lwt_interface/tskit_lwt_interface.h | 90.50% <0.00%> (-0.32%) |
:arrow_down: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 4c4ca2c...c41dca7. Read the comment docs.