H5Z-ZFP icon indicating copy to clipboard operation
H5Z-ZFP copied to clipboard

Software engineering improvements

Open markcmiller86 opened this issue 2 years ago • 5 comments

Consider here several possible improvements to the way the code is designed...

Encode ZFP params only in as expert mode params

With regards to a proposal for a solution for this issue, https://github.com/LLNL/H5Z-ZFP/issues/100#issuecomment-1449016452 makes a compelling case for what to encode cb_values in a clean way, that is, to always encode cb_values as if expert mode was used.

https://github.com/LLNL/H5Z-ZFP/issues/100#issuecomment-1449082295 makes the point that the current situation is needed for a user interface point of view, to avoid making h5repack useable only for users who can juggle with expert values.

I wondered there's a way to extract/compute the expert level values from the zfp_stream_set_{rate|precision|accuracy|reversible} zfp calls. I checked what they do, and indeed, they seem to simply compute and set the 4 "expert mode" integers in the zfp header. So it's possible to turn all other zfpmode inputs into cb_values encoded in expert mode.

A patch proposal would thus be to:

  • Document that only hd5 files encoded in expert mode are portable across different architectures
  • Change print_h5repack_farg.c to always generate cb_values with zfpmode=4 (-f UD=32013,0,5,4,…)
  • Update the examples and tests accordingly
  • Possibly print a warning about portability during encoding/decoding on big endian machines, if zfpmode in cb_values is not 4/expert

This would:

  • always generate files compatible with old versions of H5Z-ZFP, which supports expert mode cb_values
  • stop generating new files that cannot be ported across architectures
  • require no change in the h5repack API: only changes to print_h5repack_farg.c and documentation

Originally posted by @spanezz in https://github.com/LLNL/H5Z-ZFP/issues/100#issuecomment-1449756045

Also, see this note from @lindstro

Simplify headers and plugin/lib to single header and single source file

There are 6 header files and 2 sources files. I think this could be simplified down to a single source, H5Zzfp.c and single header file, H5Zzfp.h and still maintain all the current build options and interfaces.

This would mean putting all the property functions currently in H5Zzfp_props.c in H5Zzfp.c which would mean that source file contains more than just the code implementing the HDF5 filter interface. But, I think that is fine and there is already some of that going on there because it implements the H5Z_zfp_initialize() and H5Z_zfp_finalize() functions. Callers wishing to use the filter as a library would just explicitly link to the filter instead of only ensuring HDF5 library can find and dlopen it via HDF5_PLUGIN_PATH env. variable.

Should we eliminate GNU Make?

We're maintaining two build interfaces. I tend to use GNU Make but I know others like/want CMake. We get support on Windows only from CMake. I don't like forcing a dependence on CMake in order to build. OTOH, CMake is pretty ubiquitous now...especially if we're conservative about min version requirement. Our CI is uses GNU Make for Linux and CMake for Windows. We don't run tests with CMake...we'd need to plug that together (which may mostly already be addressed in #89).

Better organization of docs

I've got some nitty-gritty details and general usage mixed together in the documentation which I think makes things a bit confusing. So, these could be re-organized a tad better moving a lot of the details to something like an Advanced Issues section.

markcmiller86 avatar Mar 04 '23 22:03 markcmiller86

@brtnfld, @jwsblokland, @spanezz, @helmutg, @leighorf, @lindstro if any of you have comments on the above or related items, please let me know asap. Am targetting updated release end of this month with recent fixes plus these items and some misc. other items @lindstro reported to me in email back in Aug, 2022.

markcmiller86 avatar Mar 04 '23 23:03 markcmiller86

I think I agree with all of the above. There are some other potential improvements, such as support for querying the compression parameters previously used during compression (#105), support for OpenMP (de)compression, and support for GPU (de)compression, but I imagine those are well beyond the scope of what can be done over the coming weeks.

lindstro avatar Mar 04 '23 23:03 lindstro

such as support for querying the compression parameters previously used during compression (#105)

FYI...thats included in the 1.1.1 milestone. So, I am hoping to complete that. In fact, I've coded most of it but have been puzzing over where to put the code, https://github.com/LLNL/H5Z-ZFP/blob/c9878d117b84e9ef4fa1fad62370fba99457ad34/src/H5Zzfp_props.c#L114-L281

markcmiller86 avatar Mar 05 '23 20:03 markcmiller86

All the improvements sound reasonable to me.

brtnfld avatar Mar 06 '23 14:03 brtnfld

It all sounds good to me. Regarding PR #89. Currently, my time is limited so I do not expect to finalize this pull request soon. Having said that, I will get back to it when my schedule allows it.

jwsblokland avatar Mar 09 '23 12:03 jwsblokland