cf-python
cf-python copied to clipboard
hdf5 chunk defaults and handling
The current behaviour when reading a chunked file is somewhat surprising (to me). If one reads this variable:
float UM_m01s02i205_vn1106(time, latitude, longitude) ;
# skip uninteresting attributes for this issue
UM_m01s02i205_vn1106:_Storage = "chunked" ;
UM_m01s02i205_vn1106:_ChunkSizes = 1, 1920, 2560 ;
I see the following unexpected result:
In [30]: g = cf.read('double-chunking-testc.nc')[0]
In [31]: g.data.nc_hdf5_chunksizes()
Out[31]: ()
This is not a bug, insofar as it is the expected behaviour of the code - by construction cf-python
currently doesn't remember HDF chunksizes from the read.
Should it? If so, it could be done, possibly with certain caveats on when that's a sensible thing to do, and it may well forget them when certain operations are applied (e.g. when aggregating files with different HDF chunks, when subspacing, when adding/removing/transposing dimensions, etc.).
Another V4.0 issue!