json icon indicating copy to clipboard operation
json copied to clipboard

Direct serialization

Open kelbon opened this issue 1 year ago • 6 comments

Do library provide interface to serialize structs into json string without creating a json value? As i see, serialization requires json value

kelbon avatar Oct 04 '24 13:10 kelbon

Direct serialization will be released with Boost 1.87.

grisumbras avatar Oct 04 '24 13:10 grisumbras

what interface it will provide?

kelbon avatar Oct 04 '24 13:10 kelbon

Basically the same as with value serialization. https://www.boost.org/doc/libs/develop/libs/json/doc/html/json/ref/boost__json__serialize/overload5.html

grisumbras avatar Oct 04 '24 13:10 grisumbras

So it will work with boost pfr/ boost describe struct or what? I need more low lower level interface with serializing strings/ints/keys etc

kelbon avatar Oct 04 '24 13:10 kelbon

Such interface witih returning string will not match with output iterators, (for not copying string after it created), also i think such overload serialize(auto& value, options) may break many overloads in current code

kelbon avatar Oct 04 '24 13:10 kelbon

I can assure you that we have tests for this, no overload ambiguity happens.

If you want something more granular, setializer also will support direct operation. https://www.boost.org/doc/libs/develop/libs/json/doc/html/json/ref/boost__json__serializer/reset/overload5.html

grisumbras avatar Oct 04 '24 14:10 grisumbras

Now that 1.87.0 is out, this issue is resolved.

grisumbras avatar Dec 16 '24 11:12 grisumbras

i dont think it what i expect from direct serializing, i was about sax interface for writing string/integer/double etc similar to direct parsing interface

kelbon avatar Dec 16 '24 12:12 kelbon

Direct parsing doesn't provide a SAX interface in this library. It's the other way around: direct parsing is implemented on top of a SAX interface.

That being said, what do you need event-based serialisation for?

grisumbras avatar Dec 17 '24 16:12 grisumbras

Direct parsing doesn't provide a SAX interface in this library. It's the other way around: direct parsing is implemented on top of a SAX interface.

That being said, what do you need event-based serialisation for?

to not create json object for serialization of my struct

kelbon avatar Dec 17 '24 16:12 kelbon

But this is exactly what direct serialisation achieves. See these: https://www.boost.org/doc/libs/1_87_0/libs/json/doc/html/json/ref/boost__json__serializer/reset/overload5.html https://www.boost.org/doc/libs/1_87_0/libs/json/doc/html/json/ref/boost__json__serialize/overload5.html

grisumbras avatar Dec 17 '24 17:12 grisumbras

But this is exactly what direct serialisation achieves. See these: https://www.boost.org/doc/libs/1_87_0/libs/json/doc/html/json/ref/boost__json__serializer/reset/overload5.html https://www.boost.org/doc/libs/1_87_0/libs/json/doc/html/json/ref/boost__json__serialize/overload5.html

how i can use it for my type?

kelbon avatar Dec 17 '24 18:12 kelbon

There appears to be some miscommunication. My first inclination was to answer with "by calling one of these functions with an object of your type". But that should be obvious. Am I missing something? Is there something special about your types that exclude them from being used with these functions?

grisumbras avatar Dec 18 '24 11:12 grisumbras

There appears to be some miscommunication. My first inclination was to answer with "by calling one of these functions with an object of your type". But that should be obvious. Am I missing something? Is there something special about your types that exclude them from being used with these functions?

For example, if i have such type and want to serialize it like json array of 'count' of 'hello world' strings, how to do it with such interface effectively?

struct my_type {
int count;

my_type(int c) : count(c) {}

};

kelbon avatar Dec 18 '24 11:12 kelbon

Oh, this is currently not supported by either direct parsing or direct serialization. But I'm working on a feature that I think would allows such representation.

grisumbras avatar Dec 18 '24 11:12 grisumbras

See #1025

grisumbras avatar Dec 18 '24 11:12 grisumbras

Now i using rapidjson for this functionality: https://rapidjson.org/classrapidjson_1_1_writer.html

Its usefull, convenient and expandable, also seems to be easy to implement (more easy then json parsing etc)

kelbon avatar Dec 18 '24 12:12 kelbon

I also would welcome a SAX interface to ease porting a large existing codebase from RapidJSON to Boost JSON. When writing JSON, the SAX interface it pretty convenient if only the serialized JSON string is of interest, and it's very performant (as it avoids the extra copy of data to JSON value data structures).

https://rapidjson.org/md_doc_sax.html#Writer

In my short test replacing RapidJSON SAX writing code with standard Boost JSON code would mean a big performance regression (can share benchmark code if interested). Due to vastly different API it would also mean a lot of work. Similar SAX interface would ease porting a lot (more important point than performance).

That being said, I don't understand direct serializing provided by 1.87.0, and comment above is without taking into consideration direct serialization. Does it avoid the copy, or does it call boost::json::value_from behind the scenes? Where is the documentation/example of how to use direct serialization (besides reference documentation)?

How would I implement following function using direct serialization? std::string generatJSON(const std::vector<std::string>& columnNames, const std::vector<std::vector<double>>& rows)

where I want following resulting JSON:

{
  ColumnNames : ["Bla", "Bla],
  Rows : [
    [1,2],
    [3,4],
  ],
}

jcmonnin avatar Dec 18 '24 17:12 jcmonnin

Direct operations are described here: https://www.boost.org/doc/libs/1_87_0/libs/json/doc/html/json/conversion/direct_parsing.html. Seems like I forgot to rename the section.

Serialising two objects as if they were one is not supported, you'd need something like this:

#include <boost/describe/class.hpp>
#include <boost/core/span.hpp>

struct rows_and_cols
{
    boost::span<std::string const> ColumnNames;
    boost::span<std::vector<double> const> Rows;
};
BOOST_DESCRIBE_STRUCT( rows_and_cols, (), (ColumnNames, Rows) );

std::string generatJSON(const std::vector<std::string>& columnNames, const std::vector<std::vector<double>>& rows)
{
  rows_and_cols data{columnNames, rows};
  return boost::json::serialize(data);
}

grisumbras avatar Dec 18 '24 18:12 grisumbras

Thanks for the example, which allows me to compare performance in my benchmark (serializing a 82MB document with structure as above; the output contains mostly number, which is the bottleneck in our server).

  • RapidJSON SAX: 648ms
  • RapidJSON DOM: Build Value 289ms + Seralize 723ms = 1012ms
  • Boost JSON: Build Value 510ms + Serialize 511ms = 1021ms
  • Boost JSON Direct Serialization: 485ms

Nice to see that the performance regression of porting to Boost JSON would the solved with direct serialization. There is even some nice performance gains. (As always benchmarks are highly case specific, different use cases might give difference results).

The API issue remains however. It seems direct serialization relies on boost describe reflections. Besides porting being a lot of work due to different API style, there are also things that I would have trouble to implement (like omitting object properties that are at default values to save bandwidth, generating different versions of the JSON structure depending on client's request, ...). Use case is generating JSON on server, where JSON needs to follow some spec and memory layout of C++ structures are only loosely coupled to required JSON format. I don't find the code with helper structs to be very readable compare to SAX, and certainly less flexible.

Direct parsing doesn't provide a SAX interface in this library. It's the other way around: direct parsing is implemented on top of a SAX interface.

That being said, what do you need event-based serialisation for?

As the RapidJSON SAX API is pretty trivial (a dozen of simple to use methods in a writer class) I had a quick looks if it would be easily doable using your writer class, but it looks like the writer is fairly low level to be used externally (besides being a non-public interface). Is there any chance you would consider a RapidJSON-style SAX interface for your writer class?

jcmonnin avatar Dec 19 '24 10:12 jcmonnin

I don't think it would be possible to retrofit such interface to our serializer. The issue is that it would require complete reversal of its workflow. Moreover, such push-based approach doesn't work well with chunked output, which is central to our implementation.

That being said, every subvalue of JSON is also a full JSON value, so you could do

std::string generatJSON(const std::vector<std::string>& columnNames, const std::vector<std::vector<double>>& rows)
{
  std::string result = "{";
  
  result += json::serialize( json::string_view("ColumnNames") );
  result += ":"; 
  result += json::serialize(columnNames);
  
  result += ",";
  result += json::serialize( json::string_view("Rows") );
  result += ":"; 
  result += json::serialize(rows);
  
  result += "}";
  return result;
}

If that appears too slow (due to a lot of temporary strings), you can try using serializer directly:

void
serialize_helper(std::string& s, json::serializer& sr)
{
    char buf[1024];
    string_view sv;
    do
    {
        sv = sr.read(buf);
        s.append( sv.data(), sv.size() );
    } while( !sr.done() );
}

template< class T >
void
serialize(std::string& s, json::serializer& sr, T const& t)
{
    sr.reset( std::addressof(t) );
    serialize_helper(s, sr);
}

std::string generatJSON(const std::vector<std::string>& columnNames, const std::vector<std::vector<double>>& rows)
{
    unsigned char buf[256];
    json::serializer sr( {}, buf, sizeof(buf) );

    std::string result = "{";

    serialize( result, sr, json::string_view("ColumnNames") );
    result += ":";
    serialize( result, sr, columnNames);

    result += ",";
    serialize( result, sr, json::string_view("Rows") );
    result += ":";
    serialize( result, sr, rows);

    result += "}";
    return result;
}

serialize_helper can be further optimized if needed to do less copying.

grisumbras avatar Dec 19 '24 15:12 grisumbras

I don't think it would be possible to retrofit such interface to our serializer. The issue is that it would require complete reversal of its workflow. Moreover, such push-based approach doesn't work well with chunked output, which is central to our implementation.

That being said, every subvalue of JSON is also a full JSON value, so you could do

std::string generatJSON(const std::vectorstd::string& columnNames, const std::vector<std::vector>& rows) { std::string result = "{";

result += json::serialize( json::string_view("ColumnNames") ); result += ":"; result += json::serialize(columnNames);

result += ","; result += json::serialize( json::string_view("Rows") ); result += ":"; result += json::serialize(rows);

result += "}"; return result; } If that appears too slow (due to a lot of temporary strings), you can try using serializer directly:

void serialize_helper(std::string& s, json::serializer& sr) { char buf[1024]; string_view sv; do { sv = sr.read(buf); s.append( sv.data(), sv.size() ); } while( !sr.done() ); }

template< class T > void serialize(std::string& s, json::serializer& sr, T const& t) { sr.reset( std::addressof(t) ); serialize_helper(s, sr); }

std::string generatJSON(const std::vectorstd::string& columnNames, const std::vector<std::vector>& rows) { unsigned char buf[256]; json::serializer sr( {}, buf, sizeof(buf) );

std::string result = "{";

serialize( result, sr, json::string_view("ColumnNames") );
result += ":";
serialize( result, sr, columnNames);

result += ",";
serialize( result, sr, json::string_view("Rows") );
result += ":";
serialize( result, sr, rows);

result += "}";
return result;

} serialize_helper can be further optimized if needed to do less copying.

Firstly, you can use output iterator (or buffer for more efficiently) instead of generating std::string every time.

And you can make serializing streamable by using coroutines.

For example here i use our boost direct parsing with coroutines, which outperforms rapidjson and your direct parsing implementation in my benchmark:

https://github.com/bot-motherlib/TGBM/blob/master/include/tgbm/jsons/boostjson_sax_producer.hpp

kelbon avatar Dec 19 '24 16:12 kelbon