VirtualiZarr icon indicating copy to clipboard operation
VirtualiZarr copied to clipboard

Uniform chunks, but getting "Concat input 0 has array length 30 along the concatenation axis which is not evenly divisible by chunk length 1024"

Open rsignell opened this issue 3 months ago • 5 comments

Yet another weird error from our Pangeo Training here in Bologna.

When we try to concatenate two ERA5 evaporation virtual datasets that are both chunked (1, 12, 721, 1440) along the forecast_initial_time dimension, we get:

ValueError: Cannot concatenate arrays with partial chunks because only regular chunk grids are currently supported. Concat input 0 has array length 30 along the concatenation axis which is not evenly divisible by chunk length 1024.

the reproducible notebook is here: https://gist.github.com/rsignell/0d15a989eb5ea2a54fb8efbf3a70a879

Can't figure this one out either. :(

rsignell avatar Oct 16 '25 20:10 rsignell

I didn't really figure out the problem here, but I noticed there was a weird data variable utc_date in addition to the primary variable we were trying to concatenate:

Image

And when I tried dropping that variable from each dataset before the xr.concat(), the error disappeared and it worked!

https://gist.github.com/rsignell/480ae64b16e428663a9eefe55237e8cb

But it should have worked without dropping utc_date, shouldn't it? Or at least given a better error message?

Not sure, actually, but leaving this open just in case!

rsignell avatar Oct 18 '25 20:10 rsignell

I'm not entirely sure, but since forecast_initial_time and utc_date have chunksizes that are bigger than their shape I think this is a duplicate of #803 (forecast_initial_time is a dimension coordinate, and thus loaded into memory, which may explain why you don't get any errors for that).

The fix could be to clip the chunksize extracted by h5py to the shape, but I don't know enough about HDF5 chunking to know whether there's a better option (cc @sharkinsspatial)

keewis avatar Oct 20 '25 14:10 keewis

@rsignell Thanks for the issue. As @keewis pointed out, this appears to be an issue with the difference in how HDF5 and Zarr define chunks. My wording here may be slightly incorrect but IIUC HDF5 uses sparse allocation. In HDF5 the chunks arg defines the storage layout but chunks are allocated on-demand as written. In the case of your file forecast_initial_time has 1024 chunks allocated but only has been written.

Contrast this with the Zarr v3 model where partial chunks are only valid at the chunk grid edge and non-allocated chunks are not valid.

The HDF parser currently initializes the ManifestArray's chunks shape using the HDF5 array's chunks property which we see now may not match the actual chunks represented in the ChunkManifest. As @keewis suggested I think we can make a PR so that our ManifestArray metadata creation relies on the actual materialized chunks in the file rather than the storage layout represented by the HDF5 chunks property.

@TomNicholas @maxrjones One question here is if we want to take this opportunity to more closely align our ManifestArray implementation with the Zarr v3 spec? v3 arrays have a chunks property to maintain backward compatibility but for clarity we likely want our internal implementation to use only chunk_grid for clarity. I can probably implement this as part of the PR but it seems like it would be better left to a larger refactoring PR 🤔 .

sharkinsspatial avatar Oct 21 '25 15:10 sharkinsspatial

Thanks for raising this @rsignell , and for tracking this down @keewis and @sharkinsspatial!

v3 arrays have a chunks property to maintain https://github.com/zarr-developers/zarr-python/pull/1929 but for clarity we likely want our internal implementation to use only chunk_grid for clarity. I can probably implement this as part of the PR but it seems like it would be better left to a larger refactoring PR 🤔 .

Yes let's separate out the bugfix from the API change (which should ideally also keep the original name with a deprecation warning).

TomNicholas avatar Oct 21 '25 16:10 TomNicholas

Currently stuck on this issue as well when trying to concatenate LEOFS virtual datasets along time. Not sure if this applies to all of the NOS OFS datasets that use FVCOM (leofs, lmhofs, loofs, lsofs, ngofs2, sfbofs, sscofs), but it at least applies to the few I have tried:

<xarray.Dataset> Size: 12MB
Dimensions:             (time: 1, nele: 11509, node: 6106, siglev: 21,
                         three: 3, DateStrLen: 26, maxnode: 11, maxelem: 9,
                         four: 4, siglay: 20)
Coordinates:
  * time                (time) datetime64[ns] 8B 2025-11-21T21:00:00
    siglev              (siglev, node) float32 513kB ManifestArray<shape=(21,...
    lon                 (node) float32 24kB ManifestArray<shape=(6106,), dtyp...
    lat                 (node) float32 24kB ManifestArray<shape=(6106,), dtyp...
    lonc                (nele) float32 46kB ManifestArray<shape=(11509,), dty...
    latc                (nele) float32 46kB ManifestArray<shape=(11509,), dty...
    siglay              (siglay, node) float32 488kB ManifestArray<shape=(20,...
Dimensions without coordinates: nele, node, three, DateStrLen, maxnode,
                                maxelem, four
Data variables: (12/56)
    nprocs              int32 4B ManifestArray<shape=(), dtype=int32, chunks=()>
    partition           (nele) int32 46kB ManifestArray<shape=(11509,), dtype...
    x                   (node) float32 24kB ManifestArray<shape=(6106,), dtyp...
    y                   (node) float32 24kB ManifestArray<shape=(6106,), dtyp...
    xc                  (nele) float32 46kB ManifestArray<shape=(11509,), dty...
    yc                  (nele) float32 46kB ManifestArray<shape=(11509,), dty...
    ...                  ...
    inundation_cells    (time, nele) int32 46kB ManifestArray<shape=(1, 11509...
    aice                (time, node) float32 24kB ManifestArray<shape=(1, 610...
    vice                (time, node) float32 24kB ManifestArray<shape=(1, 610...
    tsfc                (time, node) float32 24kB ManifestArray<shape=(1, 610...
    uuice               (time, nele) float32 46kB ManifestArray<shape=(1, 115...
    vvice               (time, nele) float32 46kB ManifestArray<shape=(1, 115...
Attributes: (12/15)
    title:                       LEOFS
    institution:                 School for Marine Science and Technology
    source:                      FVCOM_4.4.7
    history:                     model started at: 21/11/2025   19:56
    references:                  http://fvcom.smast.umassd.edu, https://githu...
    Conventions:                 CF-1.0
    ...                          ...
    River_Forcing:               THERE ARE NO RIVERS IN THIS MODEL
    GroundWater_Forcing:         GROUND WATER FORCING IS OFF!
    Surface_Heat_Forcing:        FVCOM variable surface heat forcing file:\nF...
    Surface_Wind_Forcing:        FVCOM variable surface Wind forcing:\nFILE N...
    Surface_PrecipEvap_Forcing:  SURFACE PRECIPITATION FORCING IS OFF
    Ice_Model_Forcing:           FVCOM variable surface ice model forcing:\nF...

ValueError: Cannot concatenate arrays with partial chunks because only regular chunk grids are currently supported. Concat input 0 has array length 1 along the concatenation axis which is not evenly divisible by chunk length 1024.

Note that we are already using a custom/modified HDF parser for this dataset (to fix the siglay/siglev metadata), without that these datasets fail to open in the first place. The upside to that is that a fix could just be dropped in to the custom parser we are already using, just not 100% sure where to start with that.

pjsalisbury avatar Nov 24 '25 22:11 pjsalisbury