HighFive
HighFive copied to clipboard
Inconsistent character encoding
HighFive uses H5T_CSET_ASCII
for character encoding everywhere except in function create_string(size_t length)
inside H5DataType_misc.hpp where H5T_CSET_UTF8
is used as a default encoder.
This might lead to some inconsistencies.
Thanks for the report, @emanmoba. As far as I can remember, string handling in HDF5 is a bit inconsistent and maybe this is something that was introduced to overcome certain limitations. For instance, for your own and my colleagues' reference, we found a bug in the HDF5 library not so long ago and provided temporary patches in HighFive to hide the issue:
https://github.com/BlueBrain/HighFive/pull/428 https://github.com/HDFGroup/hdf5/issues/544
As we have our internal HighFive meeting this week, I will bring this issue for discussion, ok?
Thanks a lot, @sergiorg-hpc, I did not know about the bug in HDF5 library but I'll look into it. For my application, I ran into some issues that were easily fixed by changing the default encoder to H5T_CSET_ASCII
in function create_string(size_t length)
, so I thought I should address it here for other colleagues.
@emanmoba We have had a discussion today in our internal HighFive meeting and our colleague @1uc is going to look at it, as soon as he is done with an on-going work in HighFive, ok? Thanks again for the report and we will update the issue as soon as @1uc finds out more.
In 2.8.0 we've reworked reading/writing std::string
s. Choosing the character set (ASCII or UTF8) is now supported.
I've marked the ticket as v3
so we can revisit the inconsistency if/when we decide to work towards a 3.0.0
.