NetCDF.jl icon indicating copy to clipboard operation
NetCDF.jl copied to clipboard

zstd compression support

Open milankl opened this issue 3 years ago • 10 comments

NetCDF v4.9 was released last month: https://github.com/Unidata/netcdf-c/releases/tag/v4.9.0. Are there any plans to update NetCDF_jll.jl (which currently includes v4.8) and this package accordingly? I see @Alexander-Barth already created https://github.com/Alexander-Barth/NetCDF_jll.jl with builds from v4.9. Would love to start using the new zstandard compression available in v4.9... 😄

milankl avatar Jul 26 '22 13:07 milankl

Would indeed be great. The problem is getting a working build for all platforms. The 4.9 builds in https://github.com/Alexander-Barth/NetCDF_jll.jl/releases/tag/NetCDF-v400.902.9%2B0 are missing Windows for instance.

There is some hope that a newer HDF5 build will help, see https://github.com/JuliaPackaging/Yggdrasil/issues/4511#issuecomment-1155003075.

visr avatar Jul 26 '22 14:07 visr

Thanks for pushing for this in hdf5-feedstock! Fingers crossed this is available at some point soon

milankl avatar Jul 27 '22 14:07 milankl

Turns out not to have helped at all... :( https://github.com/JuliaPackaging/Yggdrasil/issues/4511#issuecomment-1198134988

visr avatar Jul 28 '22 13:07 visr

As a test, I have enabled zstandard in these these builds, if you want to try (it is not yet available in Yggdrasil):

https://github.com/Alexander-Barth/NetCDF_jll.jl/releases/tag/NetCDF-v400.902.19%2B0

NetCDF is now build with these options:

[17:07:52]   --cc            -> cc                       
[17:07:52]   --cflags        -> -I/workspace/destdir/include -I/workspace/destdir/include
[17:07:52]   --libs          -> -L/workspace/destdir/lib -lnetcdf
[17:07:52]   --static        -> -lm -lsz -lzstd -lxml2 -lhdf5-0 -lhdf5_hl-0 -lcurl-4 -lz
[17:07:52]                                               
[17:07:52]   --has-c++       -> no                                                            
[17:07:52]   --cxx           ->                                                   
[17:07:52]                                              
[17:07:52]   --has-c++4      -> no                       
[17:07:52]   --cxx4          ->                          
[17:07:52]                                               
[17:07:52]   --has-fortran   -> no                     
[17:07:52]   --has-dap       -> yes                 
[17:07:52]   --has-dap2      -> yes
[17:07:52]   --has-dap4      -> yes
[17:07:52]   --has-nc2       -> yes
[17:07:52]   --has-nc4       -> yes
[17:07:52]   --has-hdf5      -> yes
[17:07:52]   --has-hdf4      -> no
[17:07:52]   --has-logging   -> no
[17:07:52]   --has-pnetcdf   -> no
[17:07:52]   --has-szlib     -> yes
[17:07:52]   --has-cdf5      -> yes
[17:07:52]   --has-parallel4 -> no
[17:07:52]   --has-parallel  -> no
[17:07:52]   --has-nczarr    -> yes
[17:07:52]   --has-zstd      -> yes
[17:07:52]   --has-benchmarks -> no

Presumably some functions like these need to be wrapped:

include/netcdf_filter.h:EXTERNL int nc_def_var_zstandard(int ncid, int varid, int level);
include/netcdf_filter.h:EXTERNL int nc_inq_var_zstandard(int ncid, int varid, int* hasfilterp, int *levelp);

Alexander-Barth avatar Aug 10 '22 15:08 Alexander-Barth

We now ship libnetcdf v4.9. Since zstd compression support was brought up, and is not supported yet, I'll rename the issue to that. https://github.com/JuliaPackaging/Yggdrasil/pull/5319 has quite some helpful information, on what might be needed to make it work.

visr avatar Aug 20 '22 21:08 visr

I'm curious if using H5Zzstd would enable zstd compression support when using netcdf via HDF5.

mkitti avatar Aug 22 '22 07:08 mkitti

I think zstd should now work.

Details in https://github.com/Alexander-Barth/NCDatasets.jl/issues/116#issuecomment-1573850114

visr avatar Jun 02 '23 14:06 visr

Is there somewhere a quick example of how to use zstd vs zlib? At the moment I'm just doing

compression_level = 3
NcVar(...,compress=compression_level)

milankl avatar Jun 02 '23 15:06 milankl

Ah yes, the underlying NetCDF build should now support zstd, but it's not yet available in the API. For zlib there is nc_def_var_deflate and nc_inq_var_deflate in the C API, and for zstd according to https://github.com/Unidata/netcdf-c/issues/2173 (which cites your paper I see) these two C API functions have been added: nc_def_var_zstandard and nc_inq_var_zstandard. So those would need to be added first and made available to users in the Julia API.

visr avatar Jun 02 '23 15:06 visr

It may be useful for me to demonstrate how the HDF5.jl interface currently works:

julia> using HDF5, H5Zzstd, H5Zbitshuffle

julia> h5open("test.h5","w", libver_bounds=v"1.8", meta_block_size=4096) do h5f
            write_dataset(
               h5f,
               "zstdcomp_dataset",
               rand(1:10, 256, 256),
               chunk=(16,16),
               filters=[BitshuffleFilter(), ZstdFilter(9)]
           )
       end

julia> run(`h5ls -v test.h5`)
Opened "test.h5" with sec2 driver.
zstdcomp_dataset         Dataset {256/256, 256/256}
    Location:  1:195
    Links:     1
    Modified:  2023-06-02 13:55:05 EDT
    Chunks:    {16, 16} 2048 bytes
    Storage:   524288 logical bytes, 37866 allocated bytes, 1384.59% utilization
    Filter-0:  HDF5 bitshuffle filter; see https://github.com/kiyo-masui/bitshuffle-32008 OPT {0, 4, 8, 0, 0, 0}
    Filter-1:  Zstandard compression: http://www.zstd.net-32015 OPT {9}
    Type:      native long
Process(`h5ls -v test.h5`, ProcessExited(0))

See https://juliaio.github.io/HDF5.jl/stable/interface/filters/

mkitti avatar Jun 02 '23 17:06 mkitti