Tom White
Tom White
> I assume these are two digit base-10 values, which compress poorly when converted to float32? This is a common issue with converting these types of per-genotype floating point values,...
I now have a representative VCF that is 17MB in size (compressed). Running `vct_to_zarr` on it with the default settings: ```python vcf_to_zarr(input=vcf_file, fields=["INFO/*", "FORMAT/*"], output=output, max_alt_alleles=1) ``` produced Zarr files...
@ravwojdyla pointed out that #80 is related.
With #943 I was able to get the Zarr size down to about 16% larger than genozip on the test file (using bzip2).
Thanks for opening this @sanjaynagi! > The only thing I was going to do was fix the seed for the simulated data, and ensure the values returned are the same...
> Switching the order seems okay to me... It should only matter for types that implement both. And if a type supports `__array_namespace__` ideally we should be prioritizing that as...
> Because of the way cubed retries things the traceback doesn't give me any more useful context than this though. Is there a way to resurface the useful part of...
> > The default in dask is `None` > > Is it? [Looks like it's `True`](https://docs.dask.org/en/stable/_modules/dask/array/reductions.html#reduction) to me? It's confusing because the default for `concatenate` in [`blockwise` is `None`](https://docs.dask.org/en/stable/generated/dask.array.blockwise.html#dask.array.blockwise), but...
> #### For `method='blockwise'` > an error from what looks like a potential bug inside `cubed.rechunk` (from the call to `rechunk_for_blockwise` inside `flox.core.groupby_reduce`: > > ```python > File ~/Documents/Work/Code/cubed/cubed/primitive/rechunk.py:124, in...
> #### For `method='cohorts'` > I got to the point of cubed needing to implement `.blocks` as expected. From what I can tell, Flox only uses the shape of blocks,...