add SimdJsonParser2 base on bitindex
issue: https://github.com/simdjson/simdjson-java/issues/59
@arouel thanks very much, I have fix the code based on your suggestion. In the case of determining the parsing path, simdjsonParserWithFixPath provides better performance and supports compressing map and list type data into strings. It can quickly skip paths that do not require parsing and avoid creating instances of JSON nodes for each JSON node
Benchmark testing indicators. refer: environment is Species[byte, 32, S_256_BIT]
Result "org.simdjson.AParseAndSelectFixPathBenchMark.parseMultiValuesForFixPaths_Jackson": 693.528 ±(99.9%) 18.073 ops/s [Average] (min, avg, max) = (687.806, 693.528, 699.113), stdev = 4.694 CI (99.9%): [675.455, 711.601] (assumes normal distribution)
Result "org.simdjson.ParseAndSelectFixPathBenchMark.parseMultiValuesForFixPaths_SimdJson": 2258.495 ±(99.9%) 41.596 ops/s [Average] (min, avg, max) = (2242.400, 2258.495, 2269.942), stdev = 10.802 CI (99.9%): [2216.899, 2300.091] (assumes normal distribution)
Result "org.simdjson.ParseAndSelectFixPathBenchMark.parseMultiValuesForFixPaths_SimdJsonParserWithFixPath": 4075.984 ±(99.9%) 104.804 ops/s [Average] (min, avg, max) = (4029.568, 4075.984, 4100.273), stdev = 27.217 CI (99.9%): [3971.180, 4180.789] (assumes normal distribution)
How is this different from On-Demand parsing available in the c++ simdjson version?
I introduced a form of on-demand parsing in https://github.com/simdjson/simdjson-java/pull/51 (see: org.simdjson.OnDemandJsonIterator). The API requires specifying a target class to which the JSON will be parsed. However, it should be relatively easy to extend this to support a DOM-like API (JsonValue, JsonIterator, etc.), which I believe is more intuitive than introducing syntax for accessing fields and then returning an array of strings with the corresponding values.
@piotrrzysko I agree with you, a DOM-like API (JsonValue, JsonIterator, etc.) would be very helpful in use cases where only specific parts of the JSON are conditionally relevant, so that a mapping to an object would cause allocation that you want to avoid.
Can you guide us a bit, so that we can prepare a PR?
@heykirby I just want share some thoughts/questions:
With some minor API changes in
simdjson-java, could we keep theSimdJsonParserWithFixPathin another codebase or it could life in a contribution module, because it is tailored for a very specific use case?Isn't a
record JsonNodesufficient compared to usinglombok?
@arouel Thanks arouel,the unused imports has been removed
How is this different from On-Demand parsing available in the c++ simdjson version?
I introduced a form of on-demand parsing in #51 (see:
org.simdjson.OnDemandJsonIterator). The API requires specifying a target class to which the JSON will be parsed. However, it should be relatively easy to extend this to support a DOM-like API (JsonValue,JsonIterator, etc.), which I believe is more intuitive than introducing syntax for accessing fields and then returning an array of strings with the corresponding values.
hello, piotrrzysko, I used on-demand parsing,it is very convenient and efficient to deserialize json strings into java classes.it is also a solution provided by many mainstream json sdk. However, this solution requires building a Java class before parsing the field, especially for deep paths, which is not very convenient for users. for example,if want to get field for $.a.b.c.d. first we need to define class a { class b { class c{class d}}},and then to parse value, and every time parse json string, we need to create an class instance for each node, in case of large-scale data, performance may be affected.
For SimdJsonParserWithFixPath, if we want get values for multi-paths: [$.a.c,$.a,$.a.d,$.b], we only need to provide the json paths, the usage is similar to hive's user define function: json_tuple. It also supports obtaining the value of the children of the container object while obtaining the compressed string value of the container object.
the path tree will only be created once during initialization,and the result array can be reused each time json string is parsed. In scenarios with large amounts of data, repeated creation and destruction of class instances can be avoided, and there will be some advantages in performance.
Hi, sorry for the delayed reply.
@heykirby
What I meant was that we can introduce on-demand parsing for a DOM-like API, which would significantly reduce the need for creating new objects. In fact, we could have a single instance of something like OnDemandJsonValue, which would be mutable and traverse a parsed JSON under the hood (likely leveraging org.simdjson.OnDemandJsonIterator).
The schema-based API you’re referring to is simply using logic that could potentially be utilized by the on-demand DOM API as well.
@arouel
Can you guide us a bit, so that we can prepare a PR?
I’d be happy to help. Perhaps I could start by creating a skeleton of the on-demand DOM API.
Hi, sorry for the delayed reply.
@heykirby What I meant was that we can introduce on-demand parsing for a DOM-like API, which would significantly reduce the need for creating new objects. In fact, we could have a single instance of something like
OnDemandJsonValue, which would be mutable and traverse a parsed JSON under the hood (likely leveragingorg.simdjson.OnDemandJsonIterator).The schema-based API you’re referring to is simply using logic that could potentially be utilized by the on-demand DOM API as well.
@arouel
Can you guide us a bit, so that we can prepare a PR?
I’d be happy to help. Perhaps I could start by creating a skeleton of the on-demand DOM API.
thanks,piotrrzysko, It's always an expected feature.
@piotrrzysko I submitted a new PR, could you give me some guidance? https://github.com/simdjson/simdjson-java/pull/63