simdjson-java icon indicating copy to clipboard operation
simdjson-java copied to clipboard

Question about SimdJsonParser and JsonValue.

Open ZhaiMo15 opened this issue 1 year ago • 3 comments
trafficstars

I'm running code below:

private final SimdJsonParser simdJsonParser = new SimdJsonParser();
String str1 = "{\"a\": \"1\", \"b\": \"11\", \"c\": \"111\"}";
byte[] buffer1 = str1.getBytes();
JsonValue simdJsonValue1 = simdJsonParser.parse(buffer1, buffer1.length);
System.out.println("a = " + simdJsonValue1.get("a").toString() + 
                   ", b = " + simdJsonValue1.get("b").toString() + 
                   ", c = " + simdJsonValue1.get("c").toString());

String str2 = "{\"a\": \"2\", \"b\": \"22\", \"c\": \"222\"}";
byte[] buffer2 = str2.getBytes();
JsonValue simdJsonValue2 = simdJsonParser.parse(buffer2, buffer2.length);
System.out.println("a = " + simdJsonValue2.get("a").toString() + 
                   ", b = " + simdJsonValue2.get("b").toString() + 
                   ", c = " + simdJsonValue2.get("c").toString());


System.out.println("a = " + simdJsonValue1.get("a").toString() + 
                   ", b = " + simdJsonValue1.get("b").toString() + 
                   ", c = " + simdJsonValue1.get("c").toString());

And the output is

a = 1, b = 11, c = 111
a = 2, b = 22, c = 222
a = 2, b = 22, c = 222

It looks like all the JsonValue share the same buffer if they are parsed by same parser. What can I do to save the independent JsonValue for different buffer(JSON string)?

Plus, I don't think new SimdJsonParser for each JSON string is a good idea, cuz it costs performance and memory.

ZhaiMo15 avatar Jan 03 '24 09:01 ZhaiMo15

It looks like all the JsonValue share the same buffer if they are parsed by same parser. What can I do to save the independent JsonValue for different buffer(JSON string)?

This is a limitation/property of simdjson. Currently, the only option is to extract the necessary data from JsonValue and store it in some external data structure. Perhaps a solution for this would be what you described in #35. By the way, what is your use case? Why do you need to keep JsonValues between runs of parse?

Plus, I don't think new SimdJsonParser for each JSON string is a good idea, cuz it costs performance and memory.

Definitely not a good idea. An instance of SimdJsonParser is meant to be reused within a single thread.

piotrrzysko avatar Jan 05 '24 07:01 piotrrzysko

Why do you need to keep JsonValues between runs of parse?

For example, I have a lots of json need to parse. Each json is passed by others which I cannot know the content of json until they are passed to me. I want to speed up the whole parse process, so I created a hashmap. In Jackson, the key is json itself and the value is Object. When the same json passed to me in second time, I don't need to parse that json, I just need to read the hashmap and use get to get what I want. When using simdjson, I want to do the same thing. Unfortunately, if I try to use JsonValue be the value of hashmap, since there is only one parser every json shared, the second time I get the same json, I will get wrong data from the hashmap. I don't know if I describe my case clearly. Here's an example: Let's say that I get three json in order: json_a, json_b and json_a. If I use json as the key and JsonValue as the value of hashmap. The first json_a and json_b is fine, but when I get value of the second json_a, the JsonValue I get is saved "data" of json_b.

Perhaps a solution for this would be what you described in https://github.com/simdjson/simdjson-java/issues/35

Indeed, that would be a solution. I just want to know if there exists a easier way to solve it instead of creating a new API.

ZhaiMo15 avatar Jan 08 '24 02:01 ZhaiMo15

Additional conversation regarding JsonValue: https://github.com/simdjson/simdjson-java/issues/35#issuecomment-1880463656.

piotrrzysko avatar Apr 28 '24 04:04 piotrrzysko