domino-jackson Deserialization gets slow with large data sets

Deserialization gets slow with large data sets

Open howudodat opened this issue 2 years ago • 1 comments

large datasets can get really slow. JSONParser takes about 100ms for a 8MB decode and Jackson takes about 9 seconds

Here is a demo project with demo data: https://github.com/howudodat/domtest

10/17/2023 07:45:06.812 Retrieving from network
10/17/2023 07:45:06.886 validate json JSONParser len:8518427
10/17/2023 07:45:06.908 validate json Jackson
10/17/2023 07:45:15.993 decoded homes size:4400

public void onResponseReceived(Request request, Response response) {
	if (200 == response.getStatusCode()) {
		GWT.log(new Date().toString() + " validate json JSONParser len:" + response.getText().length());
		JSONValue parsed = JSONParser.parseStrict(response.getText());
		GWT.log(new Date().toString() + " validate json Jackson");
		DBResults res = DBResults_MapperImpl.INSTANCE.read(response.getText());
		if (res.success)
			GWT.log(new Date().toString() + " decoded homes size:"+res.homes.size());
	} else {
	}
}

Oct 17 '23 16:10 howudodat

It looks like this is caused by the use of long values when accumulating any kind of number from the incoming json stream

https://github.com/DominoKit/domino-jackson/blob/ae0b1542c4ba0d2572293896bd8eba9979eb5a5c/domino-jackson/src/main/java/org/dominokit/jackson/stream/impl/DefaultJsonReader.java#L655-L758

Nothing terrible in this at a glance, except for all of the math that has to be done on the value local, which is a long, and those are inherently expensive as they need to be emulated. Note that these are used whether or not a long is being read, so as to hold the accurate view of any integer value. Instead, we may want to wait until the caller figures out what type they actually want (if we can?), and decode the given byte[]s as needed more cheaply.

Unfortunately, this code is inherited from the upstream project that domino-jackson was forked from, gwt-jackson, so we don't have a great deal of visibility into why some decisions were made there. The tests are also pretty light in this area, so step one is probably to beef that up a bit.

Note that the JSON spec defines numbers extremely broadly - so broadly that JS isn't actually capable of evaluating a valid JSON string and keeping the precision found in the original payload. Java can do better (but not all libraries do), between long, BigInteger, and BigDecimal (but not NaN nor any infinities - so JS Number itself cannot be fully represented in JSON either), so fully understanding the flexibility of the current implementation will be important when trying to update it.

At a quick read, I'm pretty sure we can get away with a vastly cheaper implementation - peekNumber's job appears to be to serve as a helper for doPeek to identify the next token, so that the nextDouble/nextInt/nextLong/etc methods can coerce/parse that token to the correct type. It seems that if an integer value is detected, it is always parsed as a long and stored in peekedLong, to be cast to the correct type as needed (if it can be done without loss of precision). To resolve this bug, peekNumber should probably be rewritten to merely store the bytes that were identified, and let them be parsed into a valid value on demand.

Oct 18 '23 23:10 niloc132

domino-jackson domino-jackson copied to clipboard

Deserialization gets slow with large data sets

domino-jackson
domino-jackson copied to clipboard