save dataset into netCDF with compression
Each SST file takes 16MB on disk, but 1GB when uncompressed. Running averaging now results in each monthly time slice being a netCDF dataset on disk, as no compression is applied. It would be beneficial to do compression upon saving, to have the 'uncompress-process-compress' pipeline.
@kbernat No need to take immediate action on this. It may be that this will simply happen along as I continue working on daily->monthly averaging. I'll let you know if I need help!
Some info about compressing netcdf variables http://unidata.github.io/netcdf4-python/#section9
For each variable you can set specified compression, like:
variable.encoding.update({'zlib': True, 'complevel': 9}
or specify it as a parameter in dataset.to_netcdf()
dataset.to_netcdf(...,
encoding= { 'var_name' : {'zlib': True, 'complevel': 9}})
We could add compression control parameters to write_netcdf() operation. This usually also requires providing "reasonable" chunk sizes for large datasets.
I was going to start an issue on this myself but I see it's already here. I think this is an important thing to do. When I wrote out the results of the monthly aggregation of the SST dataset the netcdf file was 240GB(!) but with gzip that came down to 27GB.