Unable to write an array of compound types with `std::string`
Describe the bug
I have a compound that includes std::string. Then I create an array (std::vector<CompoundTypeWithString>) and try to write. Then HDF5 gets segmentation fault:
Program received signal SIGSEGV, Segmentation fault.
0x00005555558bd344 in H5T__vlen_mem_str_getlen (file=<optimized out>, _vl=0x7ffff78e6038, len=0x7fffffffcf28) at .../HDF5/src/H5Tvlen.c:617
617 *len = HDstrlen(s);
To Reproduce Here is the example code to reproduce the problem (if you remove the second element from the array then there is no problem to run the executable):
#include <highfive/H5File.hpp>
#include <highfive/H5DataSet.hpp>
#include <highfive/H5DataSpace.hpp>
#include <highfive/H5DataType.hpp>
using namespace HighFive;
typedef struct {
double x;
double y;
double z;
std::string name;
} CT;
CompoundType create_compound_CT() {
CompoundType t(
{
{"x", AtomicType<double>{}},
{"y", AtomicType<double>{}},
{"z", AtomicType<double>{}},
{"name", AtomicType<std::string>{}}
});
return t;
}
HIGHFIVE_REGISTER_TYPE(CT, create_compound_CT)
int main(int, char**) {
File file("compound.h5", File::ReadWrite | File::Create | File::Truncate);
CompoundType t = create_compound_CT();
t.commit(file, "CT");
std::vector<CT> data = {
{1, 1, 1, "one"},
{2, 2, 2, "two"} // if you leave only one element in array then the run process will end up successively
};
auto dataset = file.createDataSet("data", DataSpace::From(data), t);
dataset.write(data); // there I get seg fault
return 0;
}
Expected behavior
I expect that embedding and writing std::string is possible when using it within compound type.
Desktop (please complete the following information):
- OS: ubuntu 20.04
- Version: master branch
- HDF5: 1.12
May be @ferdonline might remember about the string type and compound data type compatibility issues?
@pramodk thank you for response,
I think I somehow found a way to partially overcome this.
The idea is to keep const char * pointing to the std::string member variable and when IO using HDF5 we explicitely specify ofssets using HOFFSET to the const char *.
The problem arises when after reading the data we need to free allocated variable lengh string memory using HDF5 H5Treclaim() command. I have to do that manually while the HighFive uses this command within data_converter() when we read in std::string and std::vector<std::string>. Thus HighFive prevents us from memory leaks when reading to std::string.
I guess it is possible to add data_converter that works with compound types like Compound and std::vector<Compound> and checks whether the compound has variable length string members (using C HDF5 API) and correctly calculates offsets for them. But that would require some work (especially calculating offsets and copying strings from a temporary to the input variable), it is not so easy (or probably impossible).
typedef struct Point{
Point() {};
Point(const double& x,
const double& y,
const double& z)
{
this->p[0] = x;
this->p[1] = y;
this->p[2] = z;;
}
void setX(const double& x) { p[0] = x; }
void setY(const double& y) { p[1] = y; }
void setZ(const double& z) { p[2] = z; }
double& x() { return p[0]; }
double& y() { return p[1]; }
double& z() { return p[2]; }
void setName(const std::string& name) {
this->name = name;
this->cname = this->name.c_str();
}
std::string getName() {
if (this->cname == nullptr)
return std::string();
return std::string(this->cname);
}
double p[3];
private:
std::string name;
const char *cname = name.c_str();
friend h5gt::CompoundType compound_Point();
} Point;
inline CompoundType compound_Point() {
CompoundType t(
{
{"x", AtomicType<double>{}, HOFFSET(Point, p[0])},
{"y", AtomicType<double>{}, HOFFSET(Point, p[1])},
{"z", AtomicType<double>{}, HOFFSET(Point, p[2])},
{"name", AtomicType<std::string>{}, HOFFSET(Point, cname)},
}, sizeof(Point));
return t;
}
As you can see when you setName() you copy string to the string member variable and set const char * to it.
Read/write operations involve const char * var.