netcdf4-python Proposal: Add zlib kwarg to netcdf4.Dataset() constructor

NetCDF zlib compression is great. However, the current Python API requires you to put 'zlib=True' on every variable you want compressed. Well... I want to compress ALL variables in my files without modifying existing code that writes into those files.

The proposal here is to add an optional zlib kwarg to netcdf4.Dataset() constructor. If you set zlib=True when opening a Dataset, the API will use that as a default value for zlib when you use createVariable().

Any thoughts on this? Would it be likely to be merged if I submit a PR that does this?

Sep 23 '16 15:09 citibeth

I can see the benefit of this, but then again I'm not crazy about cluttering up Dataset.init with a bunch of optional kwargs. In my own applications, I usually don't want to compress the coordinate variables. I'd like to hear how others feel about this.

Sep 28 '16 11:09 jswhit

Jeff,

In my own applications, I usually don't want to compress the coordinate variables

I'm curious as to why?

I suppose if 'zlib=True' were set in the constructor, then one would set 'zlib=False' to NOT define compressed variables?

-- Elizabeth

On Wed, Sep 28, 2016 at 7:54 AM, Jeff Whitaker [email protected] wrote:

I can see the benefit of this, but then again I'm not crazy about cluttering up Dataset.init with a bunch of optional kwargs. In my own applications, I usually don't want to compress the coordinate variables. I'd like to hear how others feel about this.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Unidata/netcdf4-python/issues/588#issuecomment-250145919, or mute the thread https://github.com/notifications/unsubscribe-auth/AB1cd8z1Ljwp5Gr4Uf8A2njiHtUiG_3Mks5qulWEgaJpZM4KFGFV .

Sep 28 '16 13:09 citibeth

You could add default_compression as an attribute with no corresponding constructor argument.

Sep 28 '16 15:09 dopplershift

Bearing in mind that we at Unidata are considering adding some additional compression methods to netCDF, so it may not always be as simple as zlib=True/False.

Sep 28 '16 15:09 dopplershift

Should we do compression='zlib' in anticipation of new compression methods? This will change the existing zlib= kwarg as well, right?

On Wed, Sep 28, 2016 at 11:19 AM, Ryan May [email protected] wrote:

Bearing in mind that we at Unidata are considering adding some additional compression methods to netCDF, so it may not always be as simple as zlib=True/False.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Unidata/netcdf4-python/issues/588#issuecomment-250199462, or mute the thread https://github.com/notifications/unsubscribe-auth/AB1cd4qSW5DEVEe9Bn8-P0lUHD9wHiZcks5quoWVgaJpZM4KFGFV .

Sep 28 '16 15:09 citibeth

I'm hesitant to suggest changing/adding any interfaces based on things that are still very much in the proof of concept phase. If we're talking about already adding things, then I think it's good to consider the future capabilities.

Sep 28 '16 15:09 dopplershift

OK... I think if the new "global" compression feature is to go forward, it would be zlib=True, like the current per-variable feature. If at a later date new compression features are added, both kwargs can be changed together.

On Wed, Sep 28, 2016 at 11:34 AM, Ryan May [email protected] wrote:

I'm hesitant to suggest changing/adding any interfaces based on things that are still very much in the proof of concept phase. If we're talking about already adding things, then I think it's good to consider the future capabilities.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Unidata/netcdf4-python/issues/588#issuecomment-250204251, or mute the thread https://github.com/notifications/unsubscribe-auth/AB1cd9Avl7WUU6-FXEZ_awUM_7terjVVks5quokegaJpZM4KFGFV .

Sep 28 '16 15:09 citibeth

The c code already has several additional compression algorithms in testing: bzip2, fpzip, zfp. Would using a "compress" kwarg be a better approach with standardized names for the compression algorithms?

Sep 28 '16 18:09 DennisHeimbigner

Are any of those slated for release?

Sep 28 '16 18:09 dopplershift

Not sure of the timetable. I have passed the HDF5 stuff off to LLNL. Note that right now, if you set up the proper env variables, you can automatically decode any files that use the compressors on this page.

https://support.hdfgroup.org/services/contributions.html

On 9/28/2016 12:37 PM, Ryan May wrote:

Are any of those slated for release?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Unidata/netcdf4-python/issues/588#issuecomment-250259068, or mute the thread https://github.com/notifications/unsubscribe-auth/AA3P29ck3V6JJQMG46s2H72zvuI63_gCks5qurPWgaJpZM4KFGFV.

Sep 28 '16 18:09 DennisHeimbigner

I like @dopplershift's suggestion of a default_compression attribute. It would have to be added to the _private_atts list to prevent it being written to the file as a netCDF attribute.

Oct 02 '16 11:10 jswhit

I presume this attribute would be placed (virtually) in the root group?

Oct 02 '16 19:10 DennisHeimbigner

Related to https://github.com/Unidata/netcdf4-python/issues/759

Feb 14 '18 08:02 crusaderky

If zlib is hardcoded to API only and there should be more compressors, API should be modified / enriched. I propose to implement some of this:

compression="zlib", complevel=4
compression=netCDF4.Compressors.ZLIB, complevel=4
compression="4,zlib"
compression="zlib,4"
compression=netCDF4.Compressors.ZLIB(4)

Using the netCDF4.Compressors.ZLIB as compressor object /class would be most extensible solution. Any wrapper could add some new Compressor easily. On the other hand "zlib" string is understood easier by non-programmers than objects /classes. Finally "1,zlib" or "zlib,1" is cumulative + versatile for picking AND setting the compressor, but definitely is a bit cryptic. On the other hand - it is resistant against smaller API changes in the future.

Feb 14 '18 09:02 milos-korenciak

netcdf4-python netcdf4-python copied to clipboard

Proposal: Add zlib kwarg to netcdf4.Dataset() constructor

netcdf4-python
netcdf4-python copied to clipboard