Ben Jeffery issues

Results 116 issues of


Ben Jeffery

Fetch ancestors within dask worker

There's no point doing the fetch from disk on the main process, as all the dask workers are idle while it does this.

Improve ancestor fetching

Since #828 was merged we no longer access ancestors in order when matching, and create a seperate `chunk_iterator` for each ancestor grouping. For large datasets on high-latency filesystems we are...

Add resume for sample matching

#827 added resume for ancestor matching - we should also do this for sample matching.

Document resume option

https://github.com/tskit-dev/tsinfer/pull/827 introduced resume. Once we're happy with the API and functionality it should be documented.

Out-of-core ancestor matching

Ancestor matching with 100k diploid sample data sets is looking like it will take more than a month if the number of cores available is in the 20-30 range (a...

Speed up bit unpacking in `generate_ancestors`

There are a few ways to go about this for example some CPUs have specific instrcutions for this. After some research the most portable and robust way appears to be...

sgkit: Workflow documentation

Document the workflow from VCF to inference via an sgkit dataset, with a couple of examples.

Change SgkitSampleData to accept a path or store.

The API for this currently only accepts a path. It would be better if it accepted either a path or a zarr store as an argument. For example, allowing the...

sgkit: Error out on multiple contigs

Error should have code snippet for filtering to a single contig

sgkit: Use sequence length from dataset

See also https://github.com/pystatgen/sgkit/issues/464