Initial numba module
Part of #3135
I've done quite a bit of numba investigation and found a way to use dataclasses in numba code. This seems to come at very little performance cost compared to tuples and is a lot nicer. Using a generator also seems to work fine!
Codecov Report
:x: Patch coverage is 97.91667% with 4 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 89.72%. Comparing base (0ea8e64) to head (7763594).
:warning: Report is 1 commits behind head on main.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| python/tskit/jit/numba.py | 97.91% | 2 Missing and 2 partials :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #3225 +/- ##
==========================================
+ Coverage 89.67% 89.72% +0.04%
==========================================
Files 28 29 +1
Lines 32308 32500 +192
Branches 5933 5949 +16
==========================================
+ Hits 28972 29160 +188
- Misses 1892 1894 +2
- Partials 1444 1446 +2
| Flag | Coverage Δ | |
|---|---|---|
| c-tests | 86.71% <ø> (ø) |
|
| lwt-tests | 80.38% <ø> (ø) |
|
| python-c-tests | 88.24% <ø> (ø) |
|
| python-tests | 98.79% <ø> (ø) |
|
| python-tests-no-jit | 32.61% <97.91%> (?) |
|
| python-tests-numpy1 | 50.99% <0.00%> (-1.29%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
| Files with missing lines | Coverage Δ | |
|---|---|---|
| python/tskit/jit/numba.py | 97.91% <97.91%> (ø) |
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
Having some CI weirdness that I'm not yet able to recreate.
CI Fixed.
Here's some benchmarking with the "coalescent_nodes" method from #2778 on a TS with 12M edges:
Using ts.edge_diffs: 23.3s
Calculating edge diffs and coalescent nodes in a single numba.njit function: 0.085s
Using the classes here, calculating coalescent nodes in separate client numba.njit function: 0.093s
Shall we move the first commit into its own PR? It's cluttering up this one and making it hard to see the real changes.
I had imagined something lower level that was basically a copy of the TreePosition class from here: https://github.com/jeromekelleher/sc2ts/blob/7758245c3dc537aeec3b7cd6282241b65f8843dd/sc2ts/jit.py#L107
So, we don't try to provide Pythonic APIs, but just provide direct access to the edges out and edges in, which can be numba compiled like the example in the sc2ts code.
just provide direct access to the edges out and edges in
That's how this code works, Your sc2ts code here:
while tree_pos.next():
for j in range(tree_pos.out_range[0], tree_pos.out_range[1]):
e = tree_pos.edge_removal_order[j]
c = edges_child[e]
p = edges_parent[e]
parent[c] = -1
u = p
while u != -1:
num_samples[u] -= num_samples[c]
u = parent[u]
becomes
for tree_pos in numba_ts.edge_diffs():
for j in range(*tree_pos.edges_out_index_range):
e = numba_ts.indexes_edge_removal_order[j]
c = edges_child[e]
p = edges_parent[e]
parent[c] = -1
u = p
while u != -1:
num_samples[u] -= num_samples[c]
u = parent[u]
It is still compiled, and 30% faster (for the coalesent nodes example)!
Ahh, I didn't spot that sorry. How is it faster then?
I do think we should just stick with the TreePosition interface though, because we want to support seeking backwards as well, and ultimately randomly. There's no point in adding a layer for indirection on top of that.
How is it faster then?
Mutating numpy arrays to maintain the state involves the following:
- Creating a temporary list (build_list).
- Performing bounds checks for the slice.
- Copying the data from the list into the array's memory.
Whereas yielding lightweight immuatable objects is much more amenable to numba optimisation. We might be able to get the same gains by using native objects for the state rather than numpy arrays if you are set against iteration.
Let's talk it through in person - I don't have time to form an educated opinion I'm afraid.
I've tried to closely match the exisiting tsutil implementation with next and prev. Need to so some perf checks with this new code under numba.
New code looks just as fast, proceeding to add some more tests. Will merge this then before doign docs.
Getting some weird failures on Windows here, and coverage not counting for the new module, will fix.
I've added a stab at some docs.
Re docs, eventually we probably want a "high performance" tutorial with some of this stuff, but I can have a stab at that after 1.0. There's some comments here: https://github.com/tskit-dev/tutorials/issues/150#issuecomment-980012717 and some code examples at https://github.com/tskit-dev/tutorials/issues/63
@jeromekelleher Still a coverage issue that I understand and am fixing - but modolo that this is ready for a review and merge. I've started some benchmarking with the French-Canadian dataset, but will PR changes from that separately.
I'm still uncomfortable with the code duplication
I've figured out a solution to this in https://github.com/tskit-dev/tskit/pull/3225/commits/41f59274a19f6a52f4317dc2468f4f36974dc9b9
I've addressed the rest of the comments