VirtualiZarr icon indicating copy to clipboard operation
VirtualiZarr copied to clipboard

problem with numpy type error (not serializable)

Open mdsumner opened this issue 8 months ago • 3 comments

I see this

from virtualizarr import  open_virtual_dataset
u = 'https://thredds.nci.org.au/thredds/fileServer/gb6/BRAN/BRAN2023/daily/ocean_salt_2024_06.nc'

ds = open_virtual_dataset(u)

ds.virtualize.to_kerchunk('/tmp/test.parquet', format = "parquet")
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
#   File "/VirtualiZarr/virtualizarr/accessor.py", line 137, in to_kerchunk
#     refs = dataset_to_kerchunk_refs(self.ds)
#            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#   File "/VirtualiZarr/virtualizarr/writers/kerchunk.py", line 72, in dataset_to_kerchunk_refs
#     ".zattrs": ujson.dumps(attrs),
#                ^^^^^^^^^^^^^^^^^^
# TypeError: np.int32(20) is not JSON serializable


## drop the problem numpy attribute
ds.attrs['NumFilesInSet'] = None
## now it works
ds.virtualize.to_kerchunk('/tmp/test.parquet', format = "parquet")

I wonder if this typing in attributes has a general solution? Appreciate this may be a kerchunk topic

(it takes a few minutes to virtualize from URL I'm afraid, it's a 4.3Gb file)

mdsumner avatar Apr 14 '25 00:04 mdsumner

This is an example where the correct behavior is simply whatever the kechunk spec says to do / the kerchunk library actually does. Clearly throwing an error is wrong, but otherwise it would be helpful to know what Kerchunk-like expected behavior is.

TomNicholas avatar Apr 14 '25 14:04 TomNicholas

Are you able to serialize other numpy dtypes? Presumably we must be able to?

TomNicholas avatar Apr 14 '25 14:04 TomNicholas

This attribute needs to just be coerced to a plain int.

rabernat avatar Apr 14 '25 14:04 rabernat