nowcasting_dataset icon indicating copy to clipboard operation
nowcasting_dataset copied to clipboard

Experiment with better compression for on-disk batches

Open JackKelly opened this issue 3 years ago • 3 comments

Detailed Description

For example, pbzip2 reduces our NWP batches to 20% of their original size. Hopefully we can achieve similar reductions using "proper" NetCDF compression algorithms.

Smaller batches should be faster to load; and easier to upload to public cloud / Lancium / etc.

Related issues

  • #61
  • #280
  • #498

Also, if we do find better compression, then we should probably use that better compression for our intermediate zarrs, too.

Not urgent.

JackKelly avatar Nov 24 '21 08:11 JackKelly

I remember using something like 'gzip' made the files smaller. But then it took longer to load. http://xarray.pydata.org/en/stable/generated/xarray.Dataset.to_netcdf.html I'm not sure on the right balance here

also just some general searching - not sure how useful this is https://www.unidata.ucar.edu/blogs/developer/entry/netcdf_compression

peterdudfield avatar Dec 06 '21 09:12 peterdudfield

I'm not sure on the right balance here

yeah, I think the only way to tell is to do a bunch of experiments

JackKelly avatar Dec 06 '21 09:12 JackKelly

tbh I wouldn't worry about better on-disk compression for v16. The compression we have now is fine, IMHO.

JackKelly avatar Dec 06 '21 09:12 JackKelly