Benchmarking the Two-Locus Framework and LD Calculator

Open lkirk opened this issue 2 months ago • 11 comments

For the purposes of assessing how the two-locus framework performs relative to the LD Calculator, I've performed a series of benchmarks. In our initial benchmarks, the two-locus framework was rather slow relative to the LD Calculator (up to 10x slower). We went ahead and improved some of the low-hanging inefficiencies in the code and now I'm happy to report that the two-locus framework is within 2x the speed of the LD Calculator, and in some cases faster. I have created a document that attempts to thoroughly detail the changes made and the reasoning behind them: benchmarks.pdf. For brevity, I've listed some of the highlights below for a high-level overview.

Here is a brief description of the changes that we made to increase performance:

Base case: The original code that is currently in tskit (link).
Malloc out of hot loop: Preallocate a structure of arrays used for temporary calculations instead of allocating and freeing for every pair of sites (link).
Refactor Bit Arrays: Refactor bit array interface, removing the need for temporary arrays in many cases. All functions now take a row index as a parameter (link).
Precompute Allele Counts and Biallelic Summary Function: Store precomputed bit arrays for each sample set and allele and the count of samples for each allele. Introduce a biallelic summary function that avoids multiple redundant computations, leaving the original normalized summary function for multiallelic sites (link).

Python and C tests are all passing for each of these patches.

We benchmark each change (in our benchmarks, each change is layered on the next) on a set of tree sequences with the following parameters:

parameter	value
r	1e-8
$N_e$	1e4
L	2e6

With ploidy=1 and sample sizes ranging from $10^2$ - $10^5$. We sample 15 replicates to obtain these results:

In these plots the Relative difference is $(\textrm{tl} - \textrm{ldc}) / \textrm{ldc}$ where ldc is the LD Calculator and tl is the two-locus framework.

Next, we generated a larger tree sequence with L=6e6 and sample sizes ranging from $10^3$ - $10^6$. Comparing all optimizations on the smaller (panel A) and larger (panel B), we get the following results:

I'm not sure that there's any desire to deprecate/remove the LD Calculator at this point, but at least this provides some real-world numbers on the relative performance between these two methods. I'd like to incorporate these patches into the codebase if at all possible.

cc @jeromekelleher @petrelharp @apragsdale

Sep 29 '25 13:09 lkirk