sgkit icon indicating copy to clipboard operation
sgkit copied to clipboard

Utilities to help with and/or automatically select VCF ingestion parameters

Open benjeffery opened this issue 10 months ago • 1 comments

Currently when converting large VCFs to sgkit it is hard to predict the dask worker RAM usage and to subsequently tweak the zarr_to_vcf chunk length parameters to balance RAM usage with the number of chunks. It would be possible to have a utility that takes a VCF and suggests (or sets) good values, and details the RAM usage that would result. This would help the user configure the cluster such that the workers don't crash.

benjeffery avatar Sep 04 '23 15:09 benjeffery