HighFive icon indicating copy to clipboard operation
HighFive copied to clipboard

Inconsistent character encoding

Open EMBagheri opened this issue 2 years ago • 4 comments

HighFive uses H5T_CSET_ASCII for character encoding everywhere except in function create_string(size_t length) inside H5DataType_misc.hpp where H5T_CSET_UTF8 is used as a default encoder.

This might lead to some inconsistencies.

EMBagheri avatar Mar 27 '22 22:03 EMBagheri

Thanks for the report, @emanmoba. As far as I can remember, string handling in HDF5 is a bit inconsistent and maybe this is something that was introduced to overcome certain limitations. For instance, for your own and my colleagues' reference, we found a bug in the HDF5 library not so long ago and provided temporary patches in HighFive to hide the issue:

https://github.com/BlueBrain/HighFive/pull/428 https://github.com/HDFGroup/hdf5/issues/544

As we have our internal HighFive meeting this week, I will bring this issue for discussion, ok?

sergiorg-hpc avatar Mar 28 '22 06:03 sergiorg-hpc

Thanks a lot, @sergiorg-hpc, I did not know about the bug in HDF5 library but I'll look into it. For my application, I ran into some issues that were easily fixed by changing the default encoder to H5T_CSET_ASCII in function create_string(size_t length), so I thought I should address it here for other colleagues.

EMBagheri avatar Mar 28 '22 14:03 EMBagheri

@emanmoba We have had a discussion today in our internal HighFive meeting and our colleague @1uc is going to look at it, as soon as he is done with an on-going work in HighFive, ok? Thanks again for the report and we will update the issue as soon as @1uc finds out more.

sergiorg-hpc avatar Mar 30 '22 12:03 sergiorg-hpc

In 2.8.0 we've reworked reading/writing std::strings. Choosing the character set (ASCII or UTF8) is now supported.

I've marked the ticket as v3 so we can revisit the inconsistency if/when we decide to work towards a 3.0.0.

1uc avatar Nov 03 '23 09:11 1uc