doris icon indicating copy to clipboard operation
doris copied to clipboard

[Enhancement] support simdjson to parse json document when load

Open eldenmoon opened this issue 2 years ago • 4 comments

Search before asking

  • [X] I had searched in the issues and found no similar issues.

Description

provide simdjson to parse json document in json scanner

Solution

Currently we use rapidjson to parse json document, It's fast but not fast enough compare to simdjson.And I found that the simdjson has a parsing front-end called simdjson::ondemand which will parse json when accessing fields and could strip the field token from the original document, using this feature we could reduce the cost of string copy(eg. we convert everthing to a string literal in _write_data_to_column by sprintf, I saw a hotspot from the flamegrame in this function, using simdjson::to_json_string will strip the token(a string piece) which is std::string_view and this is exactly we need).And second in _set_column_value we could iterate through the json document by for (auto field: object_val) {xxx}, this is much faster than looking up a field by it's field name like objectValue.FindMember("k1").The third optimization is the at_pointer interface simdjson provided, this could directly get the json field from original document.

bellow is the performance result from my benchmark using stream load: using config::enable_simdjson_reader=true to turn on simdjson reader to parse

image

Are you willing to submit PR?

  • [X] Yes I am willing to submit a PR!

Code of Conduct

eldenmoon avatar Aug 11 '22 02:08 eldenmoon

Why SIMDJson is slower using httplogs? Any differences between those benchmark?

Gabriel39 avatar Aug 11 '22 03:08 Gabriel39

is json a big array object in you stream load test or 9000 line json(turn on read_json_by_line)?

caiconghui avatar Aug 11 '22 03:08 caiconghui

@Gabriel39 sorry the order is wrong, i'll fix this

eldenmoon avatar Aug 11 '22 05:08 eldenmoon

@caiconghui I turn on read_json_by_line

eldenmoon avatar Aug 11 '22 05:08 eldenmoon