h5cpp
h5cpp copied to clipboard
Cleanup the 3rd party support code
The current situation
- all support for 3rd party libraries (including h5py or even the C++ standard library) lives in the source directories of the main library
- some of the code is even included in header files required by the core library
- the header files with the support data structures are included in the default
hdf5.hpp
file and thus always active without providing the user a chance to remove them
Suggested fixes
- create a new hierarchy of directories below
src/h5cpp/support
- each supported library gets s subdirectory there. For instance:
src/h5cpp/support/stl
orsrc/h5cpp/support/h5py
- the same structure should be provided for the tests below
test/support
Users who want to have support for STL data types can than easily add
#include <h5cpp/support/stl/vector.hpp>
#include <h5cpp/support/stl/string.hpp>
This would give the user a way to provide its own std::vector
integration if he needs this.
How to get there
- I would first start with moving the existing code and test the new structure
- create a simple manual for developers how to integrate 3rd party support code
Once this is done (I hope during Christmas) it would be nice if others join in providing support for other 3rd party libraries.
Up to you as the maintainer but I might suggest contrib
rather than support
. contrib
is a fairly common folder name for this kind of thing.
@planetmarshall contrib
is definitely a much better choice.
Ok. For STL my approach seems to work. However, there is some tight coupling between STL strings, vectors and HDF5 attributes. This is something I have to think about but this should not stop us from moving on. @jkotan A question for Jan: is the EBool something which is required for h5py? Somehow I forgot why this was introduced.
@jkotan A question for Jan: is the EBool something which is required for h5py? Somehow I forgot why this was introduced.
Hi Eugen, the story is about NX_BOALEAN in NeXus.
At the beginning in the old libpniio
versions to stores NX_BOOLEAN type variables DESY was used a predefined hdf5 integer type (like in C language). The problem is that when you read such a bool variable you cannot find if it is really boolean or integer one (unless you defined additional (optional) attribute type which discribes the type).
In NeXus format mapping to hdf5 format is not strictly defined i.e.
NX_BOOLEAN true/false value ( true | 1 | false | 0 )
so it can be anything which can be mapped to true and false values.
Therefore, we started to search for another option to store NX_BOOLEAN types. We didn't want to reinvent the wheel so we decided to adopt h5py a convention which stored python bool type variables in hdf5 Enum with two values FALSE
and TRUE
. In our case it is EBool
. Moreover, a lot of people use h5py to create nexus files even they don't know they use enums for store bool type variables.
Thus, our newer versions of libpniio
as well as libpninexus
library so that our newer detector servers and python-pninexus maps NeXus NX_BOOLEAN and bool c++/python types to the EBool enum.
So for DESY it is standard way of storing bool type variables.
But if you want to read back bool variables written by h5py you also need EBool
. Otherwise you will read a variable to general enum type which you need to interpret by youself .
@jkotan would it be ok for you if I move this to a contrib/nexus
directory?
@jkotan would it be ok for you if I move this to a
contrib/nexus
directory?
@eugenwintersberger , if you change only the header file location it should be OK
So far so good. I am not sure if I understand why some of the builds fail. Maybe @jkotan has a good idea ;)
Hi @eugenwintersberger, It looks like you need to add
#include <h5cpp/contrib/stl/string.hpp>
to dataset.hpp
or somewhere on into its includes because it have different read/write methods for strings. Also it is good to add
--- a/src/h5cpp/contrib/stl/array.hpp
+++ b/src/h5cpp/contrib/stl/array.hpp
@@ -1,6 +1,8 @@
#pragma once
#include <h5cpp/datatype/type_trait.hpp>
+#include <h5cpp/dataspace/type_trait.hpp>
+#include <h5cpp/dataspace/simple.hpp>
#include <array>
namespace hdf5 {
and
--- a/src/h5cpp/contrib/stl/string.hpp
+++ b/src/h5cpp/contrib/stl/string.hpp
@@ -1,6 +1,7 @@
#pragma once
#include <h5cpp/datatype/type_trait.hpp>
+#include <h5cpp/datatype/string.hpp>
#include <string>
namespace hdf5 {
and some other include statements to your test h5cpp/contrib
is not in h5cpp/hdf5.hpp
.
After that the code works on deb10 in the Release mode (for the debug mode for some compiler versions additional symbols are left in .so so you don't need to add includes statements )
For macos it is another independent issue, i.e. some conan packages were update and they started to require the newer conan version (I'm not a fan of package managers not based on distribution model). So we can either upgrade the conan version or downgrade the zlib package.
Additionally after downgrading the zlib conan package to 1.2.8 for macos all the tests are green https://jenkins.esss.dk/dm/blue/organizations/jenkins/ess-dmsc%2Fh5cpp/detail/issue_548_tests/5/pipeline
@jkotan thanks for your comments. It seems that I really should do more C++ to stay fit. What I do not understand is why this problems show up only on some of the test systems. I would expect missing header files to show up in any case. Or do I miss here something?
What I do not understand is why this problems show up only on some of the test systems. I would expect missing header files to show up in any case. Or do I miss here something?
Good question :+1: It looks like a similar problem to https://stackoverflow.com/questions/42356633/why-would-a-release-build-have-linker-errors-if-the-debug-build-didnt i.e. I'm not sure but it looks like some code instantiates the the create
and get
template methods but since it is inline one it is optimized their symbols out.
Playing with the code I have found that the instantiation of create
and get
template methods with string
is done in dataset.cpp
in the line
void Dataset::write(const char *data,const property::DatasetTransferList &dtpl)
{
write(std::string(data),dtpl);
}
that if you comment the content of the above method, i.e. write(std::string(data),dtpl);
, the linker errors disappear.
Thus, to be more precisely it would be good to move the include statement from dataset.hpp
to dataset.cpp
. Also I've noticed that there is missing an const implementation of the similar write method. Thus we could apply the following patch
--- a/src/h5cpp/node/dataset.cpp
+++ b/src/h5cpp/node/dataset.cpp
@@ -33,6 +33,7 @@
#include <h5cpp/node/functions.hpp>
#include <h5cpp/filter/external_filter.hpp>
#include <h5cpp/error/error.hpp>
+#include <h5cpp/contrib/stl/string.hpp>
namespace hdf5 {
namespace node {
@@ -211,6 +212,11 @@ void Dataset::write(const char *data,const property::DatasetTransferList &dtpl)
write(std::string(data),dtpl);
}
+void Dataset::write(const char *data,const property::DatasetTransferList &dtpl) const
+{
+ write(std::string(data),dtpl);
+}
+
filter::ExternalFilters Dataset::filters() const
{
filter::ExternalFilters efilters = filter::ExternalFilters();
Another option would be to move the implementation of the write method to the header but since it is not template method it is better not to do it.
It looks like it is even more tricky. I've just run tests where I have moved the include statement from dataset.hpp to dataset.cpp and in that time the macos compiler has complained https://jenkins.esss.dk/dm/blue/organizations/jenkins/ess-dmsc%2Fh5cpp/detail/issue_548_tests/6/pipeline so it is better to keep #include <h5cpp/contrib/stl/string.hpp>
in dataset.hpp
.
I had a look on Jans changes. Looks all good to me. The installation of the header files seems to work. @jkotan do you think it makes sense to create a PR for this issues?
Hi @eugenwintersberger, I think you can merge them into the your PR. All the changes are in the issue_548_tests
branch.
I had a look on Jans changes. Looks all good to me. The installation of the header files seems to work. @jkotan do you think it makes sense to create a PR for this issues?
Sure, make the PR. As I understand after merging the PR the user need to add #include
statements with contrib
headers or his/her own trait headers which he/she wants to use for specific c++ type/class.
It looks like the user only cannot replace his/her own std::string
trait class version because #include <h5cpp/contrib/stl/string.hpp>
is included in dataset.hpp
Hi Jan. Long time no see. I see your problem. This is a bit tricky to resolve since there is this tight coupling between the dataset IO code and the string type. If I recall correctly this is due to the fact that, for some reason I forgot, we handle variable length string and fixed length sting IO different from general variable/fixed length data IO. I was afraid that this technical debt we (most probably mainly I) introduced quite early in the project.
Ok. It seems that the string trait was not the problem. I removed now all the contrib header files from the core library. I think it is if people use your master header file h5cpp.hpp
everything should work normal. The only problem that remains is, that the code does not build on OSX (see branch issue_548
.
Hi Eugen,
h5cpp.hpp
contrary to hdf5.hpp
contains stl, i.e.
#include <h5cpp/hdf5.hpp>
#include <h5cpp/contrib/stl/stl.hpp>
I guess this is a reasonable approach to provide people with a reasonable way to start so they don't have to think about adding another header file. I will try to fix the clang problem today (without any success promises ;))
Tests are passing except for Windows. I have no Windows installation here so I will deal with this next week when I am back at home.