tskit Initial numba module

Part of #3135

Jun 18 '25 11:06 benjeffery

I've done quite a bit of numba investigation and found a way to use dataclasses in numba code. This seems to come at very little performance cost compared to tuples and is a lot nicer. Using a generator also seems to work fine!

Jun 18 '25 11:06 benjeffery

Codecov Report

:x: Patch coverage is 97.91667% with 4 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 89.72%. Comparing base (0ea8e64) to head (7763594). :warning: Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
python/tskit/jit/numba.py	97.91%	2 Missing and 2 partials :warning:

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3225      +/-   ##
==========================================
+ Coverage   89.67%   89.72%   +0.04%     
==========================================
  Files          28       29       +1     
  Lines       32308    32500     +192     
  Branches     5933     5949      +16     
==========================================
+ Hits        28972    29160     +188     
- Misses       1892     1894       +2     
- Partials     1444     1446       +2

Flag	Coverage Δ
c-tests	`86.71% <ø> (ø)`
lwt-tests	`80.38% <ø> (ø)`
python-c-tests	`88.24% <ø> (ø)`
python-tests	`98.79% <ø> (ø)`
python-tests-no-jit	`32.61% <97.91%> (?)`
python-tests-numpy1	`50.99% <0.00%> (-1.29%)`	:arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
python/tskit/jit/numba.py	`97.91% <97.91%> (ø)`

:rocket: New features to boost your workflow:

:snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Jun 18 '25 11:06 codecov[bot]

Having some CI weirdness that I'm not yet able to recreate.

Jun 18 '25 13:06 benjeffery

CI Fixed.

Here's some benchmarking with the "coalescent_nodes" method from #2778 on a TS with 12M edges:

Using ts.edge_diffs: 23.3s Calculating edge diffs and coalescent nodes in a single numba.njit function: 0.085s Using the classes here, calculating coalescent nodes in separate client numba.njit function: 0.093s

Jun 18 '25 23:06 benjeffery

Shall we move the first commit into its own PR? It's cluttering up this one and making it hard to see the real changes.

Jun 19 '25 12:06 jeromekelleher

I had imagined something lower level that was basically a copy of the TreePosition class from here: https://github.com/jeromekelleher/sc2ts/blob/7758245c3dc537aeec3b7cd6282241b65f8843dd/sc2ts/jit.py#L107

So, we don't try to provide Pythonic APIs, but just provide direct access to the edges out and edges in, which can be numba compiled like the example in the sc2ts code.

Jun 19 '25 12:06 jeromekelleher

just provide direct access to the edges out and edges in

That's how this code works, Your sc2ts code here:

    while tree_pos.next():
        for j in range(tree_pos.out_range[0], tree_pos.out_range[1]):
            e = tree_pos.edge_removal_order[j]
            c = edges_child[e]
            p = edges_parent[e]
            parent[c] = -1
            u = p
            while u != -1:
                num_samples[u] -= num_samples[c]
                u = parent[u]

becomes

    for tree_pos in numba_ts.edge_diffs():
        for j in range(*tree_pos.edges_out_index_range):
            e = numba_ts.indexes_edge_removal_order[j]
            c = edges_child[e]
            p = edges_parent[e]
            parent[c] = -1
            u = p
            while u != -1:
                num_samples[u] -= num_samples[c]
                u = parent[u]

It is still compiled, and 30% faster (for the coalesent nodes example)!

Jun 19 '25 12:06 benjeffery

Ahh, I didn't spot that sorry. How is it faster then?

I do think we should just stick with the TreePosition interface though, because we want to support seeking backwards as well, and ultimately randomly. There's no point in adding a layer for indirection on top of that.

Jun 19 '25 12:06 jeromekelleher

How is it faster then?

Mutating numpy arrays to maintain the state involves the following:

Creating a temporary list (build_list).
Performing bounds checks for the slice.
Copying the data from the list into the array's memory.

Whereas yielding lightweight immuatable objects is much more amenable to numba optimisation. We might be able to get the same gains by using native objects for the state rather than numpy arrays if you are set against iteration.

Jun 19 '25 13:06 benjeffery

Let's talk it through in person - I don't have time to form an educated opinion I'm afraid.

Jun 19 '25 13:06 jeromekelleher

I've tried to closely match the exisiting tsutil implementation with next and prev. Need to so some perf checks with this new code under numba.

Jun 27 '25 15:06 benjeffery

New code looks just as fast, proceeding to add some more tests. Will merge this then before doign docs.

Jul 01 '25 09:07 benjeffery

Getting some weird failures on Windows here, and coverage not counting for the new module, will fix.

I've added a stab at some docs.

Jul 03 '25 08:07 benjeffery

Re docs, eventually we probably want a "high performance" tutorial with some of this stuff, but I can have a stab at that after 1.0. There's some comments here: https://github.com/tskit-dev/tutorials/issues/150#issuecomment-980012717 and some code examples at https://github.com/tskit-dev/tutorials/issues/63

Jul 03 '25 08:07 hyanwong

@jeromekelleher Still a coverage issue that I understand and am fixing - but modolo that this is ready for a review and merge. I've started some benchmarking with the French-Canadian dataset, but will PR changes from that separately.

Aug 13 '25 13:08 benjeffery

I'm still uncomfortable with the code duplication

I've figured out a solution to this in https://github.com/tskit-dev/tskit/pull/3225/commits/41f59274a19f6a52f4317dc2468f4f36974dc9b9

I've addressed the rest of the comments

Aug 14 '25 17:08 benjeffery