cmor icon indicating copy to clipboard operation
cmor copied to clipboard

Update cmor.c

Open cofinoa opened this issue 2 months ago • 0 comments

Use netcCDF4 DEFAULT_CHUNK_SIZES, for chunked vars and coordinates/axis.

This relates to issue #601 where it is explained that chunk sizes of 1, for coordinates/axis, like time has a huge bad performance impact on reading those netCDF variables.

The netcdf-c library defines default CHUNK sizes for netCDF4/HDF5 files when chunkingsizes are NULL.

For current netcdf-c (i.e. version 4.9.2)

  • nc_def_var_chunking:

    [...] Chunk sizes may be specified with the chunksizes parameter or default sizes will be used if that parameter is NULL. [...]

  • See Default Chunking Scheme from netCDF User Guide (NUG):
    • [...] variables that only have a single unlimited dimension [...] the [default] chunk sizes for such variables are limited to 4KiB

    • [...] Currently the netCDF default chunk size is 4MiB, which is reasonable for filesystems on high-performance computing platforms [...]

    • [...] The current default chunking strategy of the netCDF library is to balance access time along any of a variable's dimensions, by using chunk shapes similar to the shape of the entire variable but small enough that the resulting chunk size is less than or equal to the default chunk size. This differs from an earlier default chunking strategy that always used one for the length of a chunk along any unlimited dimension, and otherwise divided up the number of chunks along fixed dimensions to keep chunk sizes less than or equal to the default chunk size. [...]

  • To change the default chunk cache size, use the nc_set_chunk_cache() function before opening the file, for all variables, or per variable use nc_set_var_chunk_cache().
  • Related HDF5 function: H5Pset_cache
  • This PR not only propose DEFAULT chunking for time coordinate/axis but also for data variable itself with unlimited dimensions.

cofinoa avatar Apr 09 '24 14:04 cofinoa