gplately icon indicating copy to clipboard operation
gplately copied to clipboard

Improve netCDF4 compression

Open brmather opened this issue 2 years ago • 1 comments

  • Add option for quantisation by specifying significant digits
  • Replace default zlib compression for newer zstd compression
  • Increase compression level to 9

brmather avatar Nov 07 '23 03:11 brmather

Some of these compression methods are not compatible with all versions of netCDF and some of the data quantisation handled by significant_digits keyword sometimes garbles the netCDF data. More investigation is required.

brmather avatar Nov 14 '23 00:11 brmather

Turns out zlib is still the best way to ensure the .nc files can be uncompressed without errors.

significant_digits does work, and will preserve nan masks so long as there is at least two significant digits.

Some timings on a 3601 x 1801 grid:

  • complevel=4 = 5.6MB file in 386ms
  • complevel=6 = 5.1MB file in 410 ms
  • complevel=9 = 5.1MB file in 550ms
  • significant_digits=2, complevel=4 = 850KB file in 300ms
  • significant_digits=2, complevel=6 = 450KB file in 350ms
  • significant_digits=2, complevel=9 = 311KB file in 1s

A complevel of 6 or 7 seems to be the best tradeoff between compression speed and file size. Specifying significant_digits yields a huge reduction in file size for negligible speed penalty.

brmather avatar Sep 13 '24 11:09 brmather

The seafloor age gridding workflow has now been modified to use the new compression options in write_netcdf_grid. The read_netcdf_grid could have more flexible reading of .nc files. Will address that later down the track.

brmather avatar Sep 17 '24 01:09 brmather

Nice job with the compression! Not easy to compress floating-point numbers.

jcannon-gplates avatar Sep 19 '24 02:09 jcannon-gplates