tsinfer icon indicating copy to clipboard operation
tsinfer copied to clipboard

Split large ancestor groups up for both caching and dask scheduling.

Open benjeffery opened this issue 2 years ago • 0 comments

We have very large ancestor groups towards the end of matching. As these take over a month of CPU each it would be best to split them up. This would mean we could resume them mid-way and have smaller dask.bag partitions. The smaller partitions result in better utilization from interrupting tasks due to worker roll-over and also reduce wasted worker time at the end of a group due to better tesselation of smaller tasks.

benjeffery avatar Jun 18 '23 00:06 benjeffery