Ben Jeffery

Results 116 issues of Ben Jeffery

When using dask for ancestor matching we currently batch the ancestors into batches of 5000, and send these to dask serially. This lets us store batch results in the resume...

enhancement

When sample matching with dask the progress bar counts each sample twice or so it seems.

@hyanwong and I sat down and tried to think through properly how ancestral allele handling from sgkit should work. The sgkit `variant_ancestral_allele` string array needs to be converted to a...

At some point after the `Splitting ultimate ancestor` log line tsinfer tries to allocate over 128GB for large datasets. Hopefully can investigate locally with smaller datasets.

Mapping these sites takes over 12 hours on large datasets, would be good to use multiple threads when doing this.

To plan a large ancestor match one needs to see plots of: - Group size (determines parallelism) - Ancestor size distribution (determines needed worker RAM) - Total ancestor length in...

Need to document all the tips and tricks for each stage of inference when working with biobank-scale data.

We have very large ancestor groups towards the end of matching. As these take over a month of CPU each it would be best to split them up. This would...