sgkit icon indicating copy to clipboard operation
sgkit copied to clipboard

`vcf_to_zarr` creates zero-sized first chunk which results in incorrect dtype.

Open benjeffery opened this issue 2 years ago • 1 comments

@tnguyengel has hit the following error while running vcf_to_zarr with the default arguments:

  File "/home/tnguyen/conda/sgkit_main/lib/python3.10/site-packages/zarr/core.py", line 2168, in _process_for_setitem
    chunk = value.astype(self._dtype, order=self._order, copy=False)
ValueError: could not convert string to float: 'A'

This is because concat_zarrs_optimized is using dtype=float64 to concat and convert the variant_alleles array. This is because the first temp zarr chunk has a variant_allele dtype of float64 This is because the first temp zarr chunk is zero-sized.

I assume this is because the target_chunk_size default of 20M is smaller than the VCF header, leading to no sites being in the first chunk. I have asked her to try a larger target_chunk_size as a workaround, and will work on a proper fix.

benjeffery avatar May 18 '23 12:05 benjeffery

I can reproduce this now:

  1. Open two windows
  2. Enable both windows to make them split screen
  3. Disable window 2 (This highlights window 1)
  4. Tap window 1 (even though it's active)
  5. Window 1 icon will disappear

This is a very minor issue, but its something that confused me initially since i thought the missing icon meant something that i didn't understand (maybe it does?).

meichthys avatar Feb 13 '25 02:02 meichthys