tskit Refactor tsk_bit_array

I'm a bit confused about what the tsk_bit_array_ struct really is. I thought it was a straightforward bit-set implementation, but there's the idea of rows which I'm confused by. Is it a 2D bit set? So, a list of N independent bit sets? If so, I think we should change the API to be more explicit about this, and make operations work on the rows and bits rather than using the get_row operation to get a row, and then having methods which just work on a single row (like intersect, substract, etc).

Also, I think it would be clearer if we used set theoretic operations through out, so add -> union etc.

So, to be clear, we'd have operations like

tsk_bit_array_set_bit(self, row, bit)
tsk_bit_array_contains(self, row, bit)

etc

What do you thing @lkirk?

Aug 30 '23 09:08 jeromekelleher

Yes, indeed these arrays are a list of N independent bit sets. I actually like the word "bit set" more than "bit array" as well. In addition to the refactor, maybe renaming things to tsk_bitset_* is a good idea as well?

I like your suggestions, they will simplify the calling code quite a bit.

Aug 31 '23 08:08 lkirk

Closing for inactivity and labelling "future", please re-open if you plan to work on this.

Jun 12 '25 22:06 benjeffery

Has this been done already @lkirk? If so, we can remove the "future" label

Jun 13 '25 08:06 jeromekelleher

@jeromekelleher it hasn't yet. I wanted to get all of the underlying machinery for two-locus stats worked out before refactoring. Once my next PR goes through, I can take care of this. I think it'd be nice to have parity between the python and C code (the python code in test_ld_matrix.py was created to match the ideas laid out here.).

Also, the ability to specify the row index in the various methods for bit arrays would add a lot of clarity to the code where they're consumed (especially where we're accessing a lot of rows as temporary variables).

Jun 13 '25 22:06 lkirk

Refactor tsk_bit_array_* structures