cate save dataset into netCDF with compression

Each SST file takes 16MB on disk, but 1GB when uncompressed. Running averaging now results in each monthly time slice being a netCDF dataset on disk, as no compression is applied. It would be beneficial to do compression upon saving, to have the 'uncompress-process-compress' pipeline.

Feb 03 '17 10:02 JanisGailis

@kbernat No need to take immediate action on this. It may be that this will simply happen along as I continue working on daily->monthly averaging. I'll let you know if I need help!

Feb 03 '17 10:02 JanisGailis

Some info about compressing netcdf variables http://unidata.github.io/netcdf4-python/#section9

Mar 02 '17 16:03 kbernat

For each variable you can set specified compression, like:

variable.encoding.update({'zlib': True, 'complevel': 9}

or specify it as a parameter in dataset.to_netcdf()

dataset.to_netcdf(..., 
                  encoding= { 'var_name' : {'zlib': True, 'complevel': 9}})

Mar 10 '17 15:03 kbernat

We could add compression control parameters to write_netcdf() operation. This usually also requires providing "reasonable" chunk sizes for large datasets.

Sep 21 '17 07:09 forman

I was going to start an issue on this myself but I see it's already here. I think this is an important thing to do. When I wrote out the results of the monthly aggregation of the SST dataset the netcdf file was 240GB(!) but with gzip that came down to 27GB.

Jul 19 '18 07:07 kjpearson