glaze icon indicating copy to clipboard operation
glaze copied to clipboard

std::span<char const> is not serializable

Open arturbac opened this issue 7 months ago • 14 comments

std::span is not serializable and glz::write_supported<std::span,glz::JSON> == true, same applies to std::vector<char because there are no implicit convertions to std::string_view from vector and span and there is a bug in array serialization see below

include/glaze/json/write.hpp:563:29: error: no viable conversion from returned value of type 'const std::span<const char>' to function return type 'const sv' (aka 'const basic_string_view<char>')
  563 |                      return value;
      |                             ^~~~~
template <class T>
      requires str_t<T> || char_t<T>
   struct to<JSON, T>
[...]
else {
               const sv str = [&]() -> const sv {
                  if constexpr (!char_array_t<T> && std::is_pointer_v<std::decay_t<T>>) {
                     return value ? value : "";
                  }
                  else if constexpr (array_char_t<T>) {
                     return *value.data() ? sv{value.data()} : "";
                  }
                  else {
                     return value;
                  }
               }();

probable fix, important is that span MAY contain '\0' in the middle of array, so I assume returned std::string_view::size is respected here and the case for array_char_t<T> has a bug assuming array does not conatain '\0' in the middle

            else {
               const sv str = [&]() -> const sv {
                  if constexpr (!char_array_t<T> && std::is_pointer_v<std::decay_t<T>>) {
                     return value ? value : "";
                  }
                  // IMHO this is redudant when convertible to span is added
                  else if constexpr (array_char_t<T>) {
                     //this is invalid ->  return *value.data() ? sv{value.data()} : "";
                     return std::string_view{ value.data(), value.size()};
                  }
                  else if constexpr ( std::convertible_to<std::span<char const>, T> )
                     {
                     auto res{ static_cast<std::span<char const>>(value)};
                     return std::string_view{ res.data(), res.size( )};
                     }
                  else {
                     return value;
                  }
               }();

arturbac avatar May 26 '25 10:05 arturbac

I see now in tests of "char array write" and "char array read" that You intentionally break array storage and treat array as char *, IMHO which does not make any sense for arrays and spans. Would You agree to change behaviour or array/span storage to respect array/span bounds and include non printable characters ? initial tests with adding in write

    else if constexpr (std::ranges::contiguous_range<T>) {
       return std::string_view{ std::ranges::begin(value), std::ranges::end(value)};
     }

shows everything is encoded as it should, similar change would be necessary in read. I can work on this, already started but I am not sure You would agree so it is worth finishing ?

arturbac avatar May 27 '25 20:05 arturbac

Thanks for looking into this. It would be great to support std::span<char const>.

//this is invalid -> return *value.data() ? sv{value.data()} : "";

I think you might be confused with what this code is doing.

  • Checks the first character (if it is null it returns a null string to write)
  • If the first character of the char array is not null then it constructs a std::string_view
  • The std::string_view will iterate until it reaches the first null character to determine the size
  • This means no null characters will be written out This behavior is important for interfacing with C programs and low level hardware interfaces, where fixed size buffers are used and the size is determined by the location of the null character.

This code is not redundant for when std::convertible_to<std::span<char const>, T> is added. I really like this idea, but it doesn't work for const char* my_string = "text", this will not satisfy std::convertible_to for std::span<char const>. std::span does not iterate until the first null character, which is good for performance.

I do agree with you and think std::span should respect bounds and include non printable characters, especially for performance reasons. I'm a little more cautious about changing the behavior of std::array<char, N>. We need a way to get the same behavior as a C style char array (char my_string[16]), where we only use the length of the string up until the null character. We need this behavior in a type like std::array, because the ergonomics of a C style char array are not good for general C++ usage, especially when it comes to compile time programming.

There is an active pull request for adding support for C++26's upcoming inplace_vector: https://github.com/stephenberry/glaze/pull/1729

This is a stack allocated type that conveys the meaning that the size is dynamic. Once we get this merged, I would be more comfortable with inplace_vector<char, N> being converted to std::span, where we use the size directly like you are asking. And, we leave std::array<char, N> for handling C apis in C++.

There is a lot of nuance here as we want to design for the least astonishment. Keep sharing your thoughts!

stephenberry avatar May 28 '25 13:05 stephenberry

And to explain why I am looking into that, I have a code which has string tables containing large string data as single table + table of offsets. So it allows single load of 2 tables at once instead of crating thousands of strings (often most of them larger than in class buffer) in vector and just refer to it with string_view. For diagnostic purposes/tests I need to load test data from json for comparison and this is not possible currently for any storage with chars when there are non printable chars like \0. So I was at first shocked that array<> size is ignored. And IMHO std::array<> does not need to be handled same way as C array for which in most cases You can not handle like sized array but only null terminated array from raw char ptr. IMHO for consistency, quality, predictability of user code, ranged containers, vector, span, array should always be handle with respect of their property size, and IMHO ppl would expect that from this library and will not expect that array<int> is handled with respect to its size and array<char> not ... by the way int8_t or uint8_t depending on arch/compilation mode points to signed/unsigned char so when people store array of some 8 bit integers glz will just truncate data and I do not think ppl would expect that..

arturbac avatar May 28 '25 21:05 arturbac

I will mention again here is active buffer overflow

else if constexpr (array_char_t<T>) {
       return *value.data() ? sv{value.data()} : "";

when sized array of chars (only) does not have 0 at all

arturbac avatar May 28 '25 21:05 arturbac

I will mention again here is active buffer overflow

Good point, although this is a corner case and it's impossible to check if a const char* actually points to something in range.

But, we can add better checks for types like std::array<char, 0>, I hadn't considered this case.

stephenberry avatar May 28 '25 21:05 stephenberry

For any size array when there is not single char \0 stored array<char,2> { 'a', 'b' )

arturbac avatar May 28 '25 21:05 arturbac

Oh yes, that is bad default behavior.

stephenberry avatar May 28 '25 21:05 stephenberry

And to explain why I am looking into that, I have a code which has string tables containing large string data as single table + table of offsets...

Thanks for this explanation. I can understand why the std::array behavior might be surprising. We could introduce a new type like glz::null_terminated_array for use cases that std::array is currently filling the role for, which is used all the time for interacting with C APIs. the downside is that this causes C++ code that could typically just use the standard library to now need to use a custom Glaze type in order to interact with a C API.

There are two very useful approaches to how to handle std::array data.

In any case, I think a compile time option is the best solution to support the C API with a null termination. And, I think your proposal should become the default behavior.

stephenberry avatar May 28 '25 21:05 stephenberry

We could introduce a new type like glz::null_terminated_array

IMHO better would be use Opts for trait how to handle array, default to respect size and optional use null termination instead if exists with handling array bounds at the same time. Because even when You use std::array for C iterop with null termination You still want to prevent buffer overflow caused by mistakes which cost almost nothing. So size should always be respected in regard to array bounds

arturbac avatar May 28 '25 22:05 arturbac

I agree, that was my final comment. If you want to go ahead and make this change I'd welcome a pull request.

stephenberry avatar May 28 '25 22:05 stephenberry

Sure, I can deliver it on weekend.

arturbac avatar May 28 '25 22:05 arturbac

Thanks, take as much time as you need.

stephenberry avatar May 28 '25 23:05 stephenberry

I am sorry but during this weekend I ca not deliver it, so fell free to do it if You wish. I have unfinished task in simple enum I need for work and it needs more time to finish than I expected.

arturbac avatar Jun 01 '25 11:06 arturbac

No problem, I'll let you know when I start working on this, otherwise do the same and I'll wait for a pull request.

stephenberry avatar Jun 01 '25 11:06 stephenberry