HighFive icon indicating copy to clipboard operation
HighFive copied to clipboard

Planning the features of HighFive v3.

Open 1uc opened this issue 1 year ago • 10 comments
trafficstars

This issue collects the features that we would like HighFive to have, but we can only do so by breaking API or build-systems. It is also an invitation for feedback from users of HighFive.

  1. Improve the CMake build-system. I see two paths:
    • Restructure the headers such that including a header highfive/boost.hpp would activate boost support. This is more similar to how Pybind11 and Boost handle dependencies.
    • Use targets such as HighFive::boost to activate boost support. This was proposed in the past: #257 and discussed again here #710.
    • Related PRs: #892, #897, #903.
  2. Restructure headers into forward declarations, declarations and definitions. This should allow us to avoid the cyclic structure. This might slightly change what each header implicitly provides.

Small decided changes:

  • There's some clumsiness with creating scalar and empty DataSpaces, see #747.
  • Default CSET for strings. Currently, it's usually ASCII, but sometimes UTF8.
  • Unify read and write_raw, see #928.
  • Remove FixLenStringArray, see #906.
  • Refactor inspector, see #918; and #939, #938

Changes that need review/discussion:

  • Be more strict about RAII failure. Currently, we only decrement the reference count of valid object; and do nothing for invalid objects, including those that have incorrectly already been reduced to refcount 0.
  • Review unifying operator==, see #401.
  • Check char[N] warrant changing, see #865.
  • I propose to stop having H5P_DEFAULT has the _hid of a default constructed PropertyList. Probably required for #786.
  • #907.
  • #934

1uc avatar Nov 24 '23 09:11 1uc

Here's a sketch of how we'd need to reorganize the header for the first approach: https://github.com/BlueBrain/HighFive/pull/892

1uc avatar Dec 13 '23 15:12 1uc

  1. (CMake) I'm all for it! (I have positioned myself in the past, so I will not repeat my arguments/preference here).
  2. (Style) I would rather have as little as possible forward declarations, and just put the implementation with the declarations where possible (and use forward declarations only if absolutely necessary). I find this way more maintainer friendly. Instead, the code structure can easily be grasped from the documentation, such as the doxygen docs as available online.

A thing to discuss is if you want to include H5Easy implemenations for Eigen, Boost, xtensor, openCV deeper in the core or not.

tdegeus avatar Dec 18 '23 16:12 tdegeus

A thing to discuss is if you want to include H5Easy implemenations for Eigen, Boost, xtensor, openCV deeper in the core or not.

Yes, this is one of the outstanding chores. It starts with #871 so that we can systematically test all combinations and can't make the Eigen mistake again. After that we reduce duplication in the tests #868 because I fear we'll soon slip into unmaintainable. Then one could start extending the trait HighFive::details::inspector until it covers everything in Easy. I've not included this, because I doubt it'll break API. The inspector isn't part of the public API and in HighFive this trait is the only way we inject container specific code. Hence, we should be free to do whatever we like. I'm not very familiar with Easy, but I wasn't expecting it to expose details about how it does these types of things. Are you aware of anything? There was something about Eigen column and row vectors, either Easy makes the distinction or HighFive does, but I don't think both do. Even if that were the case I'm not sure unifying the behaviour is worth the trouble it causes our users.

So far we're not planning on anything too disruptive. Just rectifying error prone or quirky behaviour and fixing the build-system. Sometimes these can be fixed without actually breaking well-behaved code.

1uc avatar Dec 18 '23 19:12 1uc

Regarding Style. I much prefer splitting definition and declaration. However, the point isn't cosmetic. If one's unlucky and one needs to change which files includes which other files, then HighFive can be quite uncompromising. Since C++ does really allow fully cyclic code, it should always be possible to have the preprocessor figure out what to include in which order. However, the requirement for this work, is to split declaration from definition, like one does in regular C++ code (decl in .h and defn in .cpp). In some unlucky cases one may need forward declarations. However, I think that choice has been made for us. Either the code needs the forward declaration or it doesn't. Shuffling around code will probably not help us.

Additionally, splitting the two will enable users to write code such as:

// file: write_stuff.hpp
#include <highfive/dataset_decl.hpp>
void write_data(const HighFive::DataSet& dset, const Container& containter);

// file: write_stuff.cpp
void write_data(...) { ... }

If someone really wants to reduce compile times it might help. It would also allow users to decide that they want to explicitly instantiate some of our templates and link to a "pre-compiled" HighFive.

1uc avatar Dec 18 '23 19:12 1uc

A thing to discuss is if you want to include H5Easy implemenations for Eigen, Boost, xtensor, openCV deeper in the core or not.

Yes, this is one of the outstanding chores. It starts with #871 so that we can systematically test all combinations and can't make the Eigen mistake again. After that we reduce duplication in the tests #868 because I fear we'll soon slip into unmaintainable. Then one could start extending the trait HighFive::details::inspector until it covers everything in Easy. I've not included this, because I doubt it'll break API. The inspector isn't part of the public API and in HighFive this trait is the only way we inject container specific code. Hence, we should be free to do whatever we like. I'm not very familiar with Easy, but I wasn't expecting it to expose details about how it does these types of things. Are you aware of anything? There was something about Eigen column and row vectors, either Easy makes the distinction or HighFive does, but I don't think both do. Even if that were the case I'm not sure unifying the behaviour is worth the trouble it causes our users.

  • H5Easy does not expose the user to any details concerning the HighFive implementation. I would say that that is the selling point: a single line, very generic API (similar to what can be done with h5py). I would not mind going even close to that API in the future (for example file["/path/to/dataset"] = variable for automatic dataset generation with a templated function). However, at the time that I wrote H5Easy there was a desire from the HighFive maintainers to have only free-functions on top of HighFive.
  • H5Easy explicitly only reads/writes data in row-major as expected from HDF5. I.e. the bug in HighFive did not affect H5Easy.

So far we're not planning on anything too disruptive. Just rectifying error prone or quirky behaviour and fixing the build-system. Sometimes these can be fixed without actually breaking well-behaved code.

OK! However, making a major update is also a nice time to reflect on API changes if needed or desired.

tdegeus avatar Dec 19 '23 08:12 tdegeus

P.S. Concerning CMake I also link https://github.com/conda-forge/highfive-feedstock/issues/41

tdegeus avatar Dec 19 '23 10:12 tdegeus

A prime example why we need a relatively unexciting version 3.0.0 that just fixes the way dependencies are handled, fixes the CMake and doesn't break too much else. Moreover, it should be feasible with limited resources. Not so much a new version, but rather a set of improvements that unfortunately can't be done without them being breaking, and as a consequence the major version number gets bumped.

1uc avatar Dec 19 '23 10:12 1uc

if I've read everything correctly, splitting definitions and declarations will mean that HighFive will no longer be a header-only library, right?

Why this choice?

Do you have a date for the finalization of the v3?

Let me know if I can help for CMake.

gouarin avatar Dec 19 '23 12:12 gouarin

if I've read everything correctly, splitting definitions and declarations will mean that HighFive will no longer be a header-only library, right?

HighFive will continue to be a header only library. In fact it'll make use of the fact that it's header-only more. Especially, in the CMake code.

The splitting happens into separate header files:

// file: highfive/H5File.hpp
#include "H5File_decl.hpp"
#include "H5File_impl.hpp"

and:

// file: highfive/H5File_decl.hpp
class File {
  DataSet getDataSet(...);
};

// file highfive/H5File_impl.hpp
DataSet File::getDataSet(...) { std::cout << "Yay!"; }

Then for extra points in pedantry, forward declarations (if added) need to be in a separate file (it's kinda uncommon though, so maybe we leave it off), and implementations of templates behave slightly differently from implementations of non-template functions. Hence, at it's most pedantic they'd get split into separate files too.

The only difference to what we do now is that we stop tail including:

// file: highfive/H5File.hpp
class File {
  DataSet getDataSet(...);
};

#include "highfive/H5File_defn.hpp"

The problem with this type of tail inclusion is that we can get only the declaration of File without it's implementation.

Let me know if I can help for CMake.

Thank you. We'll need testing in environments I'm unaware of and don't have easy access to via Github Actions. If you want to look at things way before they're ready for review see #897. In particular "Second:" in: https://github.com/BlueBrain/HighFive/pull/897#issuecomment-1862724398 would benefit from user feedback.

No, HighFive doesn't have a schedule. We work on it whenever other projects take forever to compile.

1uc avatar Dec 19 '23 13:12 1uc

Thank you for these clarifications. I will take a look at #897.

gouarin avatar Dec 19 '23 14:12 gouarin

We're done planning.

1uc avatar May 17 '24 09:05 1uc