tskit icon indicating copy to clipboard operation
tskit copied to clipboard

Initial sketch of read-only tables implementatation

Open jeromekelleher opened this issue 2 years ago • 6 comments

Initial proof of concept for making the TableCollection returned by ts.tables read-only. I've only done the plumbing for the NodeTable here, but the extension is pretty clear. The implementation is pretty confusing at the moment, I think it could be simplified quite a bit.

I also had a look at what we could do in terms of returning direct read-only views of the column data. I think we could do this when the tables are in read-only model, so that ts.tables.nodes.time is an efficient operation. It doesn't look too bad, actually.

While we're breaking stuff, we should also set all column arrays that we return to being read-only, even when they are copies returned when the tables aren't in read-only mode. This will hopefully let us support the (much more difficult) direct setting of table column memory from numpy arrays, at some point.

Any thoughts?

jeromekelleher avatar Dec 10 '21 17:12 jeromekelleher

I'm trying to wrap my head round this. So when tables get made from a tree sequence the tables are marked as read only by having a reference to the ts, makes sense.

What about the other way around when TableCollection.tree_sequence is called? Do we keep that as a copy operation?

benjeffery avatar Dec 16 '21 12:12 benjeffery

What about the other way around when TableCollection.tree_sequence is called? Do we keep that as a copy operation?

For the first pass I think yes. The main thing we're trying to streamline here is:

ts = tskit.load("bigfile.trees")
# analyse ts in lots of ways including using ts.tables.nodes.time and other column references

The creating-a-tree sequence-from-tables operation is much more niche I think.

I guess we could mark the input TableCollection as read-only when we call .tree_sequence() on it, though.

jeromekelleher avatar Dec 16 '21 12:12 jeromekelleher

Great, was thinking that would be the first workflow to optimise, and doing just that one first makes sense.

I guess we could mark the input TableCollection as read-only when we call .tree_sequence() on it, though.

Pretty sure this would break a lot of things and is an unexpected side-effect to make the TC read-only, we'd likely need to make copy be the default and add an argument, or have a separate method?

benjeffery avatar Dec 16 '21 13:12 benjeffery

We should make sure this change fixes #2080 while we're in here doing this type of stuff.

jeromekelleher avatar Jan 05 '22 17:01 jeromekelleher

Revisiting this in the light of #2441 and recent updates (and rebasing on them), and it looks like it basically works so we should probably try and push it through. I'll tip away at it as I have time.

This is a significant breaking change so we'd need a major version number bump for it.

jeromekelleher avatar Jul 25 '22 17:07 jeromekelleher

Codecov Report

Merging #2057 (c41dca7) into main (4c4ca2c) will decrease coverage by 1.55%. The diff coverage is 73.13%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2057      +/-   ##
==========================================
- Coverage   93.44%   91.88%   -1.56%     
==========================================
  Files          28       28              
  Lines       27380    27071     -309     
  Branches     1253     1253              
==========================================
- Hits        25584    24875     -709     
- Misses       1762     2162     +400     
  Partials       34       34              
Flag Coverage Δ
c-tests 92.26% <ø> (ø)
lwt-tests 89.05% <ø> (ø)
python-c-tests 67.74% <73.13%> (-3.46%) :arrow_down:
python-tests 98.95% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
python/_tskitmodule.c 84.46% <72.72%> (-6.55%) :arrow_down:
python/tskit/trees.py 98.72% <100.00%> (ø)
python/lwt_interface/tskit_lwt_interface.h 90.50% <0.00%> (-0.32%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 4c4ca2c...c41dca7. Read the comment docs.

codecov[bot] avatar Jul 25 '22 19:07 codecov[bot]