tskit icon indicating copy to clipboard operation
tskit copied to clipboard

Move C stats methods to ``stats.h``

Open jeromekelleher opened this issue 2 years ago • 2 comments

The trees.c file is pretty big, and there's a lot of code involved in the general stats. Also, there will be stats we want to compute which use the variant decoding, which means we would need to start including genotypes.h in trees.h. This would be a bit circular, and mean that the library isn't really modular (from a C compile perspective) any more.

So, moving tsk_treeseq_diversity, (et al) tsk_treeseq_general_stat into the existing stats.c/h file seems like a good option (which can then include genotypes.h). The stats.c file just contains the LD calculator, which is seems we'll continue to want (some version of) for some time to come, as it has orthogonal performance advantages to #2805.

We should also probably include tsk_treeseq_genealogical_nearest_neighbours and tsk_treeseq_mean_descendants.

This would still leave the trees.c file plenty big.

The only downside I guess is that all of the tsk_treeseq "methods'' are no longer compiled together, but I don't think anyone will get too upset about that.

Any objections? @molpopgen, would this affect the rust packaging at all?

@lkirk - any thoughts? I guess we'd do this immediately after your #2805 drops?

jeromekelleher avatar Aug 14 '23 13:08 jeromekelleher

It doesn't affect anything. The rust build works by compiling tskit-c and then grabbing the public API into a namespace. So as long as all things in header files remain visible, we're not affected by code organization changes.

molpopgen avatar Aug 14 '23 15:08 molpopgen

@jeromekelleher I think that the reorganization sounds like a great idea. I agree with what you've proposed.

lkirk avatar Aug 25 '23 21:08 lkirk