kerchunk
kerchunk copied to clipboard
Cloud-friendly access to archival data
Hi, I am trying to append json references to a parquet store with ``` MultiZarrToZarr.append( [json], out, coo_map={"time":"cf:time"}, concat_dims=['time'], identical_dims=["lat","lon","plev","value"], remote_protocol="file", ).translate() ``` and I ran into the issue that...
When combining along a new dimension using `coo_map`, `MultiZarrToZarr` fails to concatenate dimension coordinate variables, despite concatenating coordinate variables just fine. Minimal example: ```python from kerchunk.hdf import SingleHdf5ToZarr # Set...
I'm seeing what looks like non-deterministic behaviour with `MultiZarrToZarr`. It happens both with and without writing to parquet. Sometimes I get the expected result written, sometimes I don't. See the...
Worth having a changelog? Was 0.2.4 suppose to get a tag on github? Easiest way is to use github to compare code between releases. Nice work on the 0.2.4 release!...
@martindurant, looks like we still have a single-value variable problem. In these AWS Open Data NetCDF files, the variable 'spherical' has a single int32 value but it becomes a float64...
Hey 👋, first of all, thanks for this awesome project! It really makes working with large collections of data so much easier and I greatly appreciate the effort! Unfortunately, I...
I have some NetCDF UKCP data with a variable called "yyyymmdd" that is stored in the Kerchunk file like so: ``` "yyyymmdd/.zarray": "{\"chunks\":[1,64],\"compressor\":null,\"dtype\":\"|S1\",\"fill_value\":\"IA==\",\"filters\":null,\"order\":\"C\",\"shape\":[3600,64],\"zarr_format\":2}", "yyyymmdd/.zattrs": "{\"_ARRAY_DIMENSIONS\":[\"time\",\"string64\"],\"long_name\":\"yyyymmdd\",\"units\":\"1\"}", "yyyymmdd/0.0": "19801201", "yyyymmdd/1.0": "19801202", "yyyymmdd/2.0":...
Kerchunk doesn't properly decode the JSON for zarr array-level attributes, instead leaving dictionaries as long strings. For example: ```python # create example netCDF4 file xr.tutorial.open_dataset('air_temperature').to_netcdf('air.nc') kerchunk.backends.SingleHdf5ToZarr('air.nc', inline_threshold=300).translate() ``` ```python {'version':...
I would appreciate to have more control over the way Kerchunk is writing "refs" -especially control the chunking. Context: I previously used fsspec and kerchunk to store my data while...
Hello, thanks for this lib ! I ended up rewriting several times the scan and consolidate parts, from your tutorial. I thought this small cli would be of interest, when...