Tom White
Tom White
Some previous work here: #432
We could document how to use Dask's progress bars with `vcf_to_zarr`.
> which did not bring up any progress bars. I believe Dask is being invoked since the Dataset returned by this call has multiple chunks, so I'm a bit confused...
> Maybe we should fork this out into a separate discussion, so we can make some high-level decisions about how to do logging? Yes, this would be very useful!
Hmm just found https://github.com/tqdm/tqdm#dask-integration. Also, I wonder if we can use file position within the VCF file or region as a rough proxy for progress...
Thanks @eric-czech! I think that explains the error I posted. I tried setting the range of `threshold` to exclude 0, but I get other failures. ``` diff --git a/sgkit/tests/test_ld.py b/sgkit/tests/test_ld.py...
Thanks for opening an issue and PR to fix it @d-laub! The code looks good to me. Would you be able to add a short unit test of this function,...
It looks like the latest failures are when running tests against real VCF files, not pre-commit failures. It will need some digging to see if that is a problem introduced...
+1 to using `bcftools` for the ground truth here.
This exists as `zarr_array_sizes`, but it is not a part of the public API since it runs sequentially. Leaving this issue open to cover the parallel implementation.