sgkit icon indicating copy to clipboard operation
sgkit copied to clipboard

Xarray serialization warning when saving dataset

Open tomwhite opened this issue 2 years ago • 0 comments

From #785:

import sgkit as sg
import sgkit.io.vcf as sgvcf
sgvcf.vcf_to_zarr("sgkit/tests/io/vcf/data/sample.vcf.gz", "sample.vcf.gz.zarr")
ds = sg.load_dataset("sample.vcf.gz.zarr")
sg.save_dataset(ds, "sample2.vcf.gz.zarr", mode="w")

prints the warning:

SerializationWarning: variable None has data in the form of a dask array with dtype=object, which means it is being loaded into memory to determine a data type that can be safely stored on disk. To avoid this, coerce this variable to a fixed-size dtype with astype() before saving it.

There is an upstream xarray issue here: https://github.com/pydata/xarray/discussions/5769. #643 is related too.

tomwhite avatar May 06 '22 08:05 tomwhite