netcdf-c icon indicating copy to clipboard operation
netcdf-c copied to clipboard

zstandard filter is not built or installed?

Open edwardhartnett opened this issue 2 years ago • 21 comments

I just tried to add a test for zstandard to the code. But it fails.

It turns out that we are not building and installing the zstandard filter code itself. This is a small C program that we must compile and install on the users system, or else zstandard cannot work for them.

We build (but do not install) the bzip2 filter.

We need to build the zstandard one as well.

And we should install both filters in the HDF5 filter directory, so that netCDF can use them.

@WardF I believe this should be resolved before the release.

As it is now, the zstandard feature is non-functional. Users would still have to install CCR to use it (because CCR builds and installs the filter code). So this makes putting zstandard in netCDF pretty pointless. ;-)

edwardhartnett avatar Apr 19 '22 14:04 edwardhartnett

I am home from a week off and wading through emails; I will return to this as I review the others, in the meantime, @DennisHeimbigner any thoughts?

WardF avatar Apr 19 '22 17:04 WardF

This problem is with current main, correct?

DennisHeimbigner avatar Apr 19 '22 19:04 DennisHeimbigner

I just built the main branch and ran the test that exercises various filters and it seems to work fine, except the relevant test: nc_test4/tst_specific_filters.sh has some things turned off. Use the attached version.

Did you look at the output of configure to see if it found libzstd?

tst_specific_filters.txt

DennisHeimbigner avatar Apr 19 '22 20:04 DennisHeimbigner

It found libzstd.

Could it be that you already have the zstandard filter installed in your HDF5 filter path? ;-)

edwardhartnett avatar Apr 19 '22 20:04 edwardhartnett

I guess I assumed you had set HDF5_PLUGIN_PATH to point to your own directory of plugins.

DennisHeimbigner avatar Apr 19 '22 20:04 DennisHeimbigner

Well imagine the poor netCDF user who has never heard of HDF5 plugins but wants to use this new compression. It won't work! :-)

So we need to build the zstandard filter, and install it in HDF5_PLUGIN_PATH, or else the nc_def_var_zstandard() function cannot work. This is what CCR does. It's easy and works fine. Then the user just installs netcdf-c, and they can start using nc_def_var_zstandard().

Also, if some other user sends them a netCDF file with zstandard compression, it will not be readable. In order for zstandard to be readable to all users, like zlib, we must also install the filter in HDF5_PLUGIN_PATH.

edwardhartnett avatar Apr 19 '22 20:04 edwardhartnett

I am lost. Let me make sure that I understand,

  1. your HDF5_PLUGIN_PATH points to some ccr created directory that holds the zstd HDF5 filter wrapper.
  2. Your attempt to create a file with zstd compression fails, or
  3. Your attempt to open a file with zstd compression fails.

Correct?

DennisHeimbigner avatar Apr 19 '22 21:04 DennisHeimbigner

No I do not have the HDF5 filter wrapper installed. This is a fresh machine.

So I install zlib and zstandard with apt-get, and build HDF5.

When netcdf-c is built, zstandard is found and built. But when I try to use it, I get a filter not found error.

edwardhartnett avatar Apr 19 '22 22:04 edwardhartnett

What is the value of HDF5_PLUGIN_PATH ?

DennisHeimbigner avatar Apr 19 '22 22:04 DennisHeimbigner

I think I get it. You want to bypass the HDF5_PLUGIN_PATH altogether, right?

DennisHeimbigner avatar Apr 19 '22 22:04 DennisHeimbigner

What CCR does is include this self-contained autoconf project: https://github.com/ccr/ccr/tree/master/hdf5_plugins/ZSTANDARD

This builds the code here: https://github.com/ccr/ccr/blob/master/hdf5_plugins/ZSTANDARD/src/H5Zzstandard.c

This generates the plugin shared library for zstandard.

During testing, CCR manipulates the HDF5_PLUGIN_PATH to get the tests to work. During install, the plugin is installed in HDF5_PLUGIN_DIR.

Once that is done, then of course netcdf-c can use the zstandard plugin. But without the plugin, putting zstandard into netCDF is incomplete.

IIRC you have a copy of the bzip filter code, you could just do the same for the zstandard. Then install them in HDF5_PLUGIN_DIR, and everything will magically work for readers and writers.

edwardhartnett avatar Apr 19 '22 22:04 edwardhartnett

I did some experimenting with this idea. One problem that crops up is that HDF5_PLUGIN_PATH is an actual path, not a single directory. So it could be of the form "<dir1>:<dir2>: ..." or "<dir1>;<dir2>; ..." In this case, it is not clear where to install the wrappers. I think it would be best if we had a ./configure option of this form:

--with-standard-filter-install=<dir>

so that a specific install location is defined.

DennisHeimbigner avatar Apr 20 '22 18:04 DennisHeimbigner

Another point. In order to protect the user, I would propose that the installation not overwrite existing filter shared librarys. Rather, the user would be required to manually remove any existing installed filter wrappers. The idea is to prevent inadvertent overwrites.

DennisHeimbigner avatar Apr 20 '22 19:04 DennisHeimbigner

I agree with the extra option.

CCR has an option:

# If the env. variable HDF5_PLUGIN_PATH is set, or if
# --with-hdf5-plugin-dir=<directory>, use it as the plugin
# directory. This is necessary at the top level to provide the help
# message and --with option. If used, the option will be passed to the
# subdirs, and also handled by the configure in each filter
# subdirectory.
AC_MSG_CHECKING([where to put HDF5 plugins])
AC_ARG_WITH([hdf5-plugin-path],
            [AS_HELP_STRING([--with-hdf5-plugin-path=<directory>],
                            [specify HDF5 plugin path (defaults to /usr/local/hdf5/lib/plugin, or value of HDF5_PLUGIN_PATH, if set)])],
            [HDF5_PLUGIN_PATH=$with_hdf5_plugin_path])
HDF5_PLUGIN_PATH=${HDF5_PLUGIN_PATH-.}
AC_MSG_RESULT($HDF5_PLUGIN_PATH)

As for not overwriting, OK, if you want, but I note we don't apply that rule to any other binary in the package. That is, we will happily install our netcdf.so file on top of an existing one, without even warning the user. When they do a "make install" they get it installed, and we trust they know what they are doing.

edwardhartnett avatar Apr 20 '22 19:04 edwardhartnett

@DennisHeimbigner do you want me to make a PR or are you going to do it?

I sense that @WardF wants to release soon, so we should get this done. It needs to be in the next release...

edwardhartnett avatar Apr 23 '22 12:04 edwardhartnett

I'm coming from the outside, but it seems you are considering setting HDF5_PLUGIN_PATH. Environment variables can quite problematic, especially on Windows where mutiple copies of the C runtime library often exist. I would like to discourage that and instead encourage the use of the H5PL family of functions, which allows you to add a list of paths via a API.

https://docs.hdfgroup.org/hdf5/develop/group___h5_p_l.html

mkitti avatar Aug 17 '22 16:08 mkitti

One problem for us is that until HDF5 version 1.13 is in very general use, we are stuck with HDF5_PLUGIN_PATH. But in the end, the client has to specify the search path for finding plugins. And that spec must be potentially different for every use of the library. The new 1.13 API would give us the freedom to get the path from places other than an environment variable. What do you propose we use to replace HDF5_PLUGIN_PATH?

DennisHeimbigner avatar Aug 17 '22 19:08 DennisHeimbigner

H5PLappend was introduced in 1.10.1: https://portal.hdfgroup.org/display/HDF5/H5PL_APPEND

1.13 is not required for this functionality.

mkitti avatar Aug 17 '22 19:08 mkitti

The plugin interface might even be available in HDF5 1.8 series: https://portal.hdfgroup.org/display/HDF5/Software+Changes+from+Release+to+Release+for+HDF5-1.8

mkitti avatar Aug 17 '22 19:08 mkitti

I stand corrected about the append operation. Still, the question remains about what alternative do we use. As I see it, we have the following options:

  1. HDF5_PLUGIN_PATH
  2. The .netrc/.dodsrc file

Remember that the issue is being able to pass the path info to the netcdf-c library at run-time independent of the program that is using the library.

DennisHeimbigner avatar Aug 17 '22 19:08 DennisHeimbigner

I think retaining the first option is a good idea, as we've already started down that route. Adding additional options (to the extent it remains practical) is a great idea, but I'm hesitant to remove ones we have introduced.

WardF avatar Aug 17 '22 20:08 WardF