sgkit
sgkit copied to clipboard
Utilities to help with and/or automatically select VCF ingestion parameters
Currently when converting large VCFs to sgkit it is hard to predict the dask worker RAM usage and to subsequently tweak the zarr_to_vcf
chunk length parameters to balance RAM usage with the number of chunks. It would be possible to have a utility that takes a VCF and suggests (or sets) good values, and details the RAM usage that would result. This would help the user configure the cluster such that the workers don't crash.