h5cpp icon indicating copy to clipboard operation
h5cpp copied to clipboard

Cleanup the 3rd party support code

Open eugenwintersberger opened this issue 3 years ago • 21 comments

The current situation

  • all support for 3rd party libraries (including h5py or even the C++ standard library) lives in the source directories of the main library
  • some of the code is even included in header files required by the core library
  • the header files with the support data structures are included in the default hdf5.hpp file and thus always active without providing the user a chance to remove them

Suggested fixes

  • create a new hierarchy of directories below src/h5cpp/support
  • each supported library gets s subdirectory there. For instance: src/h5cpp/support/stl or src/h5cpp/support/h5py
  • the same structure should be provided for the tests below test/support Users who want to have support for STL data types can than easily add
#include <h5cpp/support/stl/vector.hpp>
#include <h5cpp/support/stl/string.hpp>

This would give the user a way to provide its own std::vector integration if he needs this.

How to get there

  • I would first start with moving the existing code and test the new structure
  • create a simple manual for developers how to integrate 3rd party support code

Once this is done (I hope during Christmas) it would be nice if others join in providing support for other 3rd party libraries.

eugenwintersberger avatar Dec 15 '21 08:12 eugenwintersberger

Up to you as the maintainer but I might suggest contrib rather than support. contrib is a fairly common folder name for this kind of thing.

planetmarshall avatar Dec 15 '21 12:12 planetmarshall

@planetmarshall contrib is definitely a much better choice.

eugenwintersberger avatar Dec 23 '21 16:12 eugenwintersberger

Ok. For STL my approach seems to work. However, there is some tight coupling between STL strings, vectors and HDF5 attributes. This is something I have to think about but this should not stop us from moving on. @jkotan A question for Jan: is the EBool something which is required for h5py? Somehow I forgot why this was introduced.

eugenwintersberger avatar Dec 24 '21 10:12 eugenwintersberger

@jkotan A question for Jan: is the EBool something which is required for h5py? Somehow I forgot why this was introduced.

Hi Eugen, the story is about NX_BOALEAN in NeXus.

At the beginning in the old libpniio versions to stores NX_BOOLEAN type variables DESY was used a predefined hdf5 integer type (like in C language). The problem is that when you read such a bool variable you cannot find if it is really boolean or integer one (unless you defined additional (optional) attribute type which discribes the type).

In NeXus format mapping to hdf5 format is not strictly defined i.e.

NX_BOOLEAN  true/false value ( true | 1 | false | 0 )

so it can be anything which can be mapped to true and false values. Therefore, we started to search for another option to store NX_BOOLEAN types. We didn't want to reinvent the wheel so we decided to adopt h5py a convention which stored python bool type variables in hdf5 Enum with two values FALSE and TRUE. In our case it is EBool. Moreover, a lot of people use h5py to create nexus files even they don't know they use enums for store bool type variables. Thus, our newer versions of libpniio as well as libpninexus library so that our newer detector servers and python-pninexus maps NeXus NX_BOOLEAN and bool c++/python types to the EBool enum. So for DESY it is standard way of storing bool type variables. But if you want to read back bool variables written by h5py you also need EBool. Otherwise you will read a variable to general enum type which you need to interpret by youself .

jkotan avatar Dec 25 '21 11:12 jkotan

@jkotan would it be ok for you if I move this to a contrib/nexus directory?

eugenwintersberger avatar Dec 26 '21 09:12 eugenwintersberger

@jkotan would it be ok for you if I move this to a contrib/nexus directory?

@eugenwintersberger , if you change only the header file location it should be OK

jkotan avatar Dec 26 '21 14:12 jkotan

So far so good. I am not sure if I understand why some of the builds fail. Maybe @jkotan has a good idea ;)

eugenwintersberger avatar Dec 27 '21 18:12 eugenwintersberger

Hi @eugenwintersberger, It looks like you need to add

#include <h5cpp/contrib/stl/string.hpp>

to dataset.hpp or somewhere on into its includes because it have different read/write methods for strings. Also it is good to add

--- a/src/h5cpp/contrib/stl/array.hpp
+++ b/src/h5cpp/contrib/stl/array.hpp
@@ -1,6 +1,8 @@
 #pragma once
 
 #include <h5cpp/datatype/type_trait.hpp>
+#include <h5cpp/dataspace/type_trait.hpp>
+#include <h5cpp/dataspace/simple.hpp>
 #include <array>
 
 namespace hdf5 { 

and

--- a/src/h5cpp/contrib/stl/string.hpp
+++ b/src/h5cpp/contrib/stl/string.hpp
@@ -1,6 +1,7 @@
 #pragma once
 
 #include <h5cpp/datatype/type_trait.hpp>
+#include <h5cpp/datatype/string.hpp>
 #include <string>
 
 namespace hdf5 { 

and some other include statements to your test h5cpp/contrib is not in h5cpp/hdf5.hpp. After that the code works on deb10 in the Release mode (for the debug mode for some compiler versions additional symbols are left in .so so you don't need to add includes statements )

For macos it is another independent issue, i.e. some conan packages were update and they started to require the newer conan version (I'm not a fan of package managers not based on distribution model). So we can either upgrade the conan version or downgrade the zlib package.

jkotan avatar Dec 28 '21 18:12 jkotan

Additionally after downgrading the zlib conan package to 1.2.8 for macos all the tests are green https://jenkins.esss.dk/dm/blue/organizations/jenkins/ess-dmsc%2Fh5cpp/detail/issue_548_tests/5/pipeline

jkotan avatar Dec 30 '21 17:12 jkotan

@jkotan thanks for your comments. It seems that I really should do more C++ to stay fit. What I do not understand is why this problems show up only on some of the test systems. I would expect missing header files to show up in any case. Or do I miss here something?

eugenwintersberger avatar Dec 31 '21 15:12 eugenwintersberger

What I do not understand is why this problems show up only on some of the test systems. I would expect missing header files to show up in any case. Or do I miss here something?

Good question :+1: It looks like a similar problem to https://stackoverflow.com/questions/42356633/why-would-a-release-build-have-linker-errors-if-the-debug-build-didnt i.e. I'm not sure but it looks like some code instantiates the the create and get template methods but since it is inline one it is optimized their symbols out.

jkotan avatar Dec 31 '21 16:12 jkotan

Playing with the code I have found that the instantiation of create and get template methods with string is done in dataset.cpp in the line

void Dataset::write(const char *data,const property::DatasetTransferList &dtpl)
{
  write(std::string(data),dtpl);
}

that if you comment the content of the above method, i.e. write(std::string(data),dtpl);, the linker errors disappear. Thus, to be more precisely it would be good to move the include statement from dataset.hpp to dataset.cpp. Also I've noticed that there is missing an const implementation of the similar write method. Thus we could apply the following patch

--- a/src/h5cpp/node/dataset.cpp
+++ b/src/h5cpp/node/dataset.cpp
@@ -33,6 +33,7 @@
 #include <h5cpp/node/functions.hpp>
 #include <h5cpp/filter/external_filter.hpp>
 #include <h5cpp/error/error.hpp>
+#include <h5cpp/contrib/stl/string.hpp>
 
 namespace hdf5 {
 namespace node {
@@ -211,6 +212,11 @@ void Dataset::write(const char *data,const property::DatasetTransferList &dtpl)
   write(std::string(data),dtpl);
 }
 
+void Dataset::write(const char *data,const property::DatasetTransferList &dtpl) const
+{
+  write(std::string(data),dtpl);
+}
+
 filter::ExternalFilters Dataset::filters() const
 {
   filter::ExternalFilters efilters = filter::ExternalFilters();

Another option would be to move the implementation of the write method to the header but since it is not template method it is better not to do it.

jkotan avatar Jan 01 '22 15:01 jkotan

It looks like it is even more tricky. I've just run tests where I have moved the include statement from dataset.hpp to dataset.cpp and in that time the macos compiler has complained https://jenkins.esss.dk/dm/blue/organizations/jenkins/ess-dmsc%2Fh5cpp/detail/issue_548_tests/6/pipeline so it is better to keep #include <h5cpp/contrib/stl/string.hpp> in dataset.hpp.

jkotan avatar Jan 01 '22 15:01 jkotan

I had a look on Jans changes. Looks all good to me. The installation of the header files seems to work. @jkotan do you think it makes sense to create a PR for this issues?

eugenwintersberger avatar Jan 04 '22 17:01 eugenwintersberger

Hi @eugenwintersberger, I think you can merge them into the your PR. All the changes are in the issue_548_tests branch.

jkotan avatar Jan 04 '22 17:01 jkotan

I had a look on Jans changes. Looks all good to me. The installation of the header files seems to work. @jkotan do you think it makes sense to create a PR for this issues?

Sure, make the PR. As I understand after merging the PR the user need to add #include statements with contrib headers or his/her own trait headers which he/she wants to use for specific c++ type/class. It looks like the user only cannot replace his/her own std::string trait class version because #include <h5cpp/contrib/stl/string.hpp> is included in dataset.hpp

jkotan avatar Jan 05 '22 10:01 jkotan

Hi Jan. Long time no see. I see your problem. This is a bit tricky to resolve since there is this tight coupling between the dataset IO code and the string type. If I recall correctly this is due to the fact that, for some reason I forgot, we handle variable length string and fixed length sting IO different from general variable/fixed length data IO. I was afraid that this technical debt we (most probably mainly I) introduced quite early in the project.

eugenwintersberger avatar Aug 08 '22 10:08 eugenwintersberger

Ok. It seems that the string trait was not the problem. I removed now all the contrib header files from the core library. I think it is if people use your master header file h5cpp.hpp everything should work normal. The only problem that remains is, that the code does not build on OSX (see branch issue_548.

eugenwintersberger avatar Aug 09 '22 06:08 eugenwintersberger

Hi Eugen,

h5cpp.hpp contrary to hdf5.hpp contains stl, i.e.

#include <h5cpp/hdf5.hpp>
#include <h5cpp/contrib/stl/stl.hpp>

jkotan avatar Aug 09 '22 06:08 jkotan

I guess this is a reasonable approach to provide people with a reasonable way to start so they don't have to think about adding another header file. I will try to fix the clang problem today (without any success promises ;))

eugenwintersberger avatar Aug 09 '22 06:08 eugenwintersberger

Tests are passing except for Windows. I have no Windows installation here so I will deal with this next week when I am back at home.

eugenwintersberger avatar Aug 09 '22 11:08 eugenwintersberger