netcdf4-python
netcdf4-python copied to clipboard
Proposal: Add zlib kwarg to netcdf4.Dataset() constructor
NetCDF zlib compression is great. However, the current Python API requires you to put 'zlib=True' on every variable you want compressed. Well... I want to compress ALL variables in my files without modifying existing code that writes into those files.
The proposal here is to add an optional zlib kwarg to netcdf4.Dataset() constructor. If you set zlib=True when opening a Dataset, the API will use that as a default value for zlib when you use createVariable().
Any thoughts on this? Would it be likely to be merged if I submit a PR that does this?
I can see the benefit of this, but then again I'm not crazy about cluttering up Dataset.init with a bunch of optional kwargs. In my own applications, I usually don't want to compress the coordinate variables. I'd like to hear how others feel about this.
Jeff,
In my own applications, I usually don't want to compress the coordinate variables
I'm curious as to why?
I suppose if 'zlib=True' were set in the constructor, then one would set 'zlib=False' to NOT define compressed variables?
-- Elizabeth
On Wed, Sep 28, 2016 at 7:54 AM, Jeff Whitaker [email protected] wrote:
I can see the benefit of this, but then again I'm not crazy about cluttering up Dataset.init with a bunch of optional kwargs. In my own applications, I usually don't want to compress the coordinate variables. I'd like to hear how others feel about this.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Unidata/netcdf4-python/issues/588#issuecomment-250145919, or mute the thread https://github.com/notifications/unsubscribe-auth/AB1cd8z1Ljwp5Gr4Uf8A2njiHtUiG_3Mks5qulWEgaJpZM4KFGFV .
You could add default_compression as an attribute with no corresponding constructor argument.
Bearing in mind that we at Unidata are considering adding some additional compression methods to netCDF, so it may not always be as simple as zlib=True/False.
Should we do compression='zlib' in anticipation of new compression
methods? This will change the existing zlib= kwarg as well, right?
On Wed, Sep 28, 2016 at 11:19 AM, Ryan May [email protected] wrote:
Bearing in mind that we at Unidata are considering adding some additional compression methods to netCDF, so it may not always be as simple as zlib=True/False.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Unidata/netcdf4-python/issues/588#issuecomment-250199462, or mute the thread https://github.com/notifications/unsubscribe-auth/AB1cd4qSW5DEVEe9Bn8-P0lUHD9wHiZcks5quoWVgaJpZM4KFGFV .
I'm hesitant to suggest changing/adding any interfaces based on things that are still very much in the proof of concept phase. If we're talking about already adding things, then I think it's good to consider the future capabilities.
OK... I think if the new "global" compression feature is to go forward, it would be zlib=True, like the current per-variable feature. If at a later date new compression features are added, both kwargs can be changed together.
On Wed, Sep 28, 2016 at 11:34 AM, Ryan May [email protected] wrote:
I'm hesitant to suggest changing/adding any interfaces based on things that are still very much in the proof of concept phase. If we're talking about already adding things, then I think it's good to consider the future capabilities.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Unidata/netcdf4-python/issues/588#issuecomment-250204251, or mute the thread https://github.com/notifications/unsubscribe-auth/AB1cd9Avl7WUU6-FXEZ_awUM_7terjVVks5quokegaJpZM4KFGFV .
The c code already has several additional compression algorithms in testing: bzip2, fpzip, zfp. Would using a "compress" kwarg be a better approach with standardized names for the compression algorithms?
Are any of those slated for release?
Not sure of the timetable. I have passed the HDF5 stuff off to LLNL. Note that right now, if you set up the proper env variables, you can automatically decode any files that use the compressors on this page.
https://support.hdfgroup.org/services/contributions.html
On 9/28/2016 12:37 PM, Ryan May wrote:
Are any of those slated for release?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Unidata/netcdf4-python/issues/588#issuecomment-250259068, or mute the thread https://github.com/notifications/unsubscribe-auth/AA3P29ck3V6JJQMG46s2H72zvuI63_gCks5qurPWgaJpZM4KFGFV.
I like @dopplershift's suggestion of a default_compression attribute. It would have to be added to the _private_atts list to prevent it being written to the file as a netCDF attribute.
I presume this attribute would be placed (virtually) in the root group?
Related to https://github.com/Unidata/netcdf4-python/issues/759
If zlib is hardcoded to API only and there should be more compressors, API should be modified / enriched. I propose to implement some of this:
- compression="zlib", complevel=4
- compression=netCDF4.Compressors.ZLIB, complevel=4
- compression="4,zlib"
- compression="zlib,4"
- compression=netCDF4.Compressors.ZLIB(4)
Using the netCDF4.Compressors.ZLIB as compressor object /class would be most extensible solution. Any wrapper could add some new Compressor easily. On the other hand "zlib" string is understood easier by non-programmers than objects /classes. Finally "1,zlib" or "zlib,1" is cumulative + versatile for picking AND setting the compressor, but definitely is a bit cryptic. On the other hand - it is resistant against smaller API changes in the future.