HighFive
HighFive copied to clipboard
Planning the features of HighFive v3.
This issue collects the features that we would like HighFive to have, but we can only do so by breaking API or build-systems. It is also an invitation for feedback from users of HighFive.
- Improve the CMake build-system. I see two paths:
- Restructure the headers such that including a header
highfive/boost.hppwould activate boost support. This is more similar to how Pybind11 and Boost handle dependencies. - Use targets such as
HighFive::boostto activate boost support. This was proposed in the past: #257 and discussed again here #710. - Related PRs: #892, #897, #903.
- Restructure the headers such that including a header
- Restructure headers into forward declarations, declarations and definitions. This should allow us to avoid the cyclic structure. This might slightly change what each header implicitly provides.
Small decided changes:
- There's some clumsiness with creating scalar and empty
DataSpaces, see #747. - Default CSET for strings. Currently, it's usually ASCII, but sometimes UTF8.
- Unify
readandwrite_raw, see #928. - Remove
FixLenStringArray, see #906. - Refactor
inspector, see #918; and #939, #938
Changes that need review/discussion:
- Be more strict about RAII failure. Currently, we only decrement the reference count of valid object; and do nothing for invalid objects, including those that have incorrectly already been reduced to refcount 0.
- Review unifying
operator==, see #401. - Check
char[N]warrant changing, see #865. - I propose to stop having
H5P_DEFAULThas the_hidof a default constructedPropertyList. Probably required for #786. - #907.
- #934
Here's a sketch of how we'd need to reorganize the header for the first approach: https://github.com/BlueBrain/HighFive/pull/892
- (CMake) I'm all for it! (I have positioned myself in the past, so I will not repeat my arguments/preference here).
- (Style) I would rather have as little as possible forward declarations, and just put the implementation with the declarations where possible (and use forward declarations only if absolutely necessary). I find this way more maintainer friendly. Instead, the code structure can easily be grasped from the documentation, such as the doxygen docs as available online.
A thing to discuss is if you want to include H5Easy implemenations for Eigen, Boost, xtensor, openCV deeper in the core or not.
A thing to discuss is if you want to include H5Easy implemenations for Eigen, Boost, xtensor, openCV deeper in the core or not.
Yes, this is one of the outstanding chores. It starts with #871 so that we can systematically test all combinations and can't make the Eigen mistake again. After that we reduce duplication in the tests #868 because I fear we'll soon slip into unmaintainable. Then one could start extending the trait HighFive::details::inspector until it covers everything in Easy. I've not included this, because I doubt it'll break API. The inspector isn't part of the public API and in HighFive this trait is the only way we inject container specific code. Hence, we should be free to do whatever we like. I'm not very familiar with Easy, but I wasn't expecting it to expose details about how it does these types of things. Are you aware of anything? There was something about Eigen column and row vectors, either Easy makes the distinction or HighFive does, but I don't think both do. Even if that were the case I'm not sure unifying the behaviour is worth the trouble it causes our users.
So far we're not planning on anything too disruptive. Just rectifying error prone or quirky behaviour and fixing the build-system. Sometimes these can be fixed without actually breaking well-behaved code.
Regarding Style. I much prefer splitting definition and declaration. However, the point isn't cosmetic. If one's unlucky and one needs to change which files includes which other files, then HighFive can be quite uncompromising. Since C++ does really allow fully cyclic code, it should always be possible to have the preprocessor figure out what to include in which order. However, the requirement for this work, is to split declaration from definition, like one does in regular C++ code (decl in .h and defn in .cpp). In some unlucky cases one may need forward declarations. However, I think that choice has been made for us. Either the code needs the forward declaration or it doesn't. Shuffling around code will probably not help us.
Additionally, splitting the two will enable users to write code such as:
// file: write_stuff.hpp
#include <highfive/dataset_decl.hpp>
void write_data(const HighFive::DataSet& dset, const Container& containter);
// file: write_stuff.cpp
void write_data(...) { ... }
If someone really wants to reduce compile times it might help. It would also allow users to decide that they want to explicitly instantiate some of our templates and link to a "pre-compiled" HighFive.
A thing to discuss is if you want to include H5Easy implemenations for Eigen, Boost, xtensor, openCV deeper in the core or not.
Yes, this is one of the outstanding chores. It starts with #871 so that we can systematically test all combinations and can't make the Eigen mistake again. After that we reduce duplication in the tests #868 because I fear we'll soon slip into unmaintainable. Then one could start extending the trait
HighFive::details::inspectoruntil it covers everything in Easy. I've not included this, because I doubt it'll break API. Theinspectorisn't part of the public API and in HighFive this trait is the only way we inject container specific code. Hence, we should be free to do whatever we like. I'm not very familiar with Easy, but I wasn't expecting it to expose details about how it does these types of things. Are you aware of anything? There was something about Eigen column and row vectors, either Easy makes the distinction or HighFive does, but I don't think both do. Even if that were the case I'm not sure unifying the behaviour is worth the trouble it causes our users.
- H5Easy does not expose the user to any details concerning the HighFive implementation. I would say that that is the selling point: a single line, very generic API (similar to what can be done with
h5py). I would not mind going even close to that API in the future (for examplefile["/path/to/dataset"] = variablefor automatic dataset generation with a templated function). However, at the time that I wrote H5Easy there was a desire from the HighFive maintainers to have only free-functions on top of HighFive. - H5Easy explicitly only reads/writes data in row-major as expected from HDF5. I.e. the bug in HighFive did not affect H5Easy.
So far we're not planning on anything too disruptive. Just rectifying error prone or quirky behaviour and fixing the build-system. Sometimes these can be fixed without actually breaking well-behaved code.
OK! However, making a major update is also a nice time to reflect on API changes if needed or desired.
P.S. Concerning CMake I also link https://github.com/conda-forge/highfive-feedstock/issues/41
A prime example why we need a relatively unexciting version 3.0.0 that just fixes the way dependencies are handled, fixes the CMake and doesn't break too much else. Moreover, it should be feasible with limited resources. Not so much a new version, but rather a set of improvements that unfortunately can't be done without them being breaking, and as a consequence the major version number gets bumped.
if I've read everything correctly, splitting definitions and declarations will mean that HighFive will no longer be a header-only library, right?
Why this choice?
Do you have a date for the finalization of the v3?
Let me know if I can help for CMake.
if I've read everything correctly, splitting definitions and declarations will mean that HighFive will no longer be a header-only library, right?
HighFive will continue to be a header only library. In fact it'll make use of the fact that it's header-only more. Especially, in the CMake code.
The splitting happens into separate header files:
// file: highfive/H5File.hpp
#include "H5File_decl.hpp"
#include "H5File_impl.hpp"
and:
// file: highfive/H5File_decl.hpp
class File {
DataSet getDataSet(...);
};
// file highfive/H5File_impl.hpp
DataSet File::getDataSet(...) { std::cout << "Yay!"; }
Then for extra points in pedantry, forward declarations (if added) need to be in a separate file (it's kinda uncommon though, so maybe we leave it off), and implementations of templates behave slightly differently from implementations of non-template functions. Hence, at it's most pedantic they'd get split into separate files too.
The only difference to what we do now is that we stop tail including:
// file: highfive/H5File.hpp
class File {
DataSet getDataSet(...);
};
#include "highfive/H5File_defn.hpp"
The problem with this type of tail inclusion is that we can get only the declaration of File without it's implementation.
Let me know if I can help for CMake.
Thank you. We'll need testing in environments I'm unaware of and don't have easy access to via Github Actions. If you want to look at things way before they're ready for review see #897. In particular "Second:" in: https://github.com/BlueBrain/HighFive/pull/897#issuecomment-1862724398 would benefit from user feedback.
No, HighFive doesn't have a schedule. We work on it whenever other projects take forever to compile.
Thank you for these clarifications. I will take a look at #897.
We're done planning.