netcdf-c
netcdf-c copied to clipboard
zstandard filter is not built or installed?
I just tried to add a test for zstandard to the code. But it fails.
It turns out that we are not building and installing the zstandard filter code itself. This is a small C program that we must compile and install on the users system, or else zstandard cannot work for them.
We build (but do not install) the bzip2 filter.
We need to build the zstandard one as well.
And we should install both filters in the HDF5 filter directory, so that netCDF can use them.
@WardF I believe this should be resolved before the release.
As it is now, the zstandard feature is non-functional. Users would still have to install CCR to use it (because CCR builds and installs the filter code). So this makes putting zstandard in netCDF pretty pointless. ;-)
I am home from a week off and wading through emails; I will return to this as I review the others, in the meantime, @DennisHeimbigner any thoughts?
This problem is with current main, correct?
I just built the main branch and ran the test that exercises various filters and it seems to work fine, except the relevant test: nc_test4/tst_specific_filters.sh has some things turned off. Use the attached version.
Did you look at the output of configure to see if it found libzstd?
It found libzstd.
Could it be that you already have the zstandard filter installed in your HDF5 filter path? ;-)
I guess I assumed you had set HDF5_PLUGIN_PATH to point to your own directory of plugins.
Well imagine the poor netCDF user who has never heard of HDF5 plugins but wants to use this new compression. It won't work! :-)
So we need to build the zstandard filter, and install it in HDF5_PLUGIN_PATH, or else the nc_def_var_zstandard() function cannot work. This is what CCR does. It's easy and works fine. Then the user just installs netcdf-c, and they can start using nc_def_var_zstandard().
Also, if some other user sends them a netCDF file with zstandard compression, it will not be readable. In order for zstandard to be readable to all users, like zlib, we must also install the filter in HDF5_PLUGIN_PATH.
I am lost. Let me make sure that I understand,
- your HDF5_PLUGIN_PATH points to some ccr created directory that holds the zstd HDF5 filter wrapper.
- Your attempt to create a file with zstd compression fails, or
- Your attempt to open a file with zstd compression fails.
Correct?
No I do not have the HDF5 filter wrapper installed. This is a fresh machine.
So I install zlib and zstandard with apt-get, and build HDF5.
When netcdf-c is built, zstandard is found and built. But when I try to use it, I get a filter not found error.
What is the value of HDF5_PLUGIN_PATH ?
I think I get it. You want to bypass the HDF5_PLUGIN_PATH altogether, right?
What CCR does is include this self-contained autoconf project: https://github.com/ccr/ccr/tree/master/hdf5_plugins/ZSTANDARD
This builds the code here: https://github.com/ccr/ccr/blob/master/hdf5_plugins/ZSTANDARD/src/H5Zzstandard.c
This generates the plugin shared library for zstandard.
During testing, CCR manipulates the HDF5_PLUGIN_PATH to get the tests to work. During install, the plugin is installed in HDF5_PLUGIN_DIR.
Once that is done, then of course netcdf-c can use the zstandard plugin. But without the plugin, putting zstandard into netCDF is incomplete.
IIRC you have a copy of the bzip filter code, you could just do the same for the zstandard. Then install them in HDF5_PLUGIN_DIR, and everything will magically work for readers and writers.
I did some experimenting with this idea. One problem that crops up is that HDF5_PLUGIN_PATH is an actual path, not a single directory. So it could be of the form "<dir1>:<dir2>: ..." or "<dir1>;<dir2>; ..." In this case, it is not clear where to install the wrappers. I think it would be best if we had a ./configure option of this form:
--with-standard-filter-install=<dir>
so that a specific install location is defined.
Another point. In order to protect the user, I would propose that the installation not overwrite existing filter shared librarys. Rather, the user would be required to manually remove any existing installed filter wrappers. The idea is to prevent inadvertent overwrites.
I agree with the extra option.
CCR has an option:
# If the env. variable HDF5_PLUGIN_PATH is set, or if
# --with-hdf5-plugin-dir=<directory>, use it as the plugin
# directory. This is necessary at the top level to provide the help
# message and --with option. If used, the option will be passed to the
# subdirs, and also handled by the configure in each filter
# subdirectory.
AC_MSG_CHECKING([where to put HDF5 plugins])
AC_ARG_WITH([hdf5-plugin-path],
[AS_HELP_STRING([--with-hdf5-plugin-path=<directory>],
[specify HDF5 plugin path (defaults to /usr/local/hdf5/lib/plugin, or value of HDF5_PLUGIN_PATH, if set)])],
[HDF5_PLUGIN_PATH=$with_hdf5_plugin_path])
HDF5_PLUGIN_PATH=${HDF5_PLUGIN_PATH-.}
AC_MSG_RESULT($HDF5_PLUGIN_PATH)
As for not overwriting, OK, if you want, but I note we don't apply that rule to any other binary in the package. That is, we will happily install our netcdf.so file on top of an existing one, without even warning the user. When they do a "make install" they get it installed, and we trust they know what they are doing.
@DennisHeimbigner do you want me to make a PR or are you going to do it?
I sense that @WardF wants to release soon, so we should get this done. It needs to be in the next release...
I'm coming from the outside, but it seems you are considering setting HDF5_PLUGIN_PATH
. Environment variables can quite problematic, especially on Windows where mutiple copies of the C runtime library often exist. I would like to discourage that and instead encourage the use of the H5PL family of functions, which allows you to add a list of paths via a API.
https://docs.hdfgroup.org/hdf5/develop/group___h5_p_l.html
One problem for us is that until HDF5 version 1.13 is in very general use, we are stuck with HDF5_PLUGIN_PATH. But in the end, the client has to specify the search path for finding plugins. And that spec must be potentially different for every use of the library. The new 1.13 API would give us the freedom to get the path from places other than an environment variable. What do you propose we use to replace HDF5_PLUGIN_PATH?
H5PLappend
was introduced in 1.10.1:
https://portal.hdfgroup.org/display/HDF5/H5PL_APPEND
1.13 is not required for this functionality.
The plugin interface might even be available in HDF5 1.8 series: https://portal.hdfgroup.org/display/HDF5/Software+Changes+from+Release+to+Release+for+HDF5-1.8
I stand corrected about the append operation. Still, the question remains about what alternative do we use. As I see it, we have the following options:
- HDF5_PLUGIN_PATH
- The .netrc/.dodsrc file
Remember that the issue is being able to pass the path info to the netcdf-c library at run-time independent of the program that is using the library.
I think retaining the first option is a good idea, as we've already started down that route. Adding additional options (to the extent it remains practical) is a great idea, but I'm hesitant to remove ones we have introduced.