jackson-core
jackson-core copied to clipboard
JsonPointer quadratic memory use: OOME on deep inputs
Adding https://github.com/nst/JSONTestSuite in https://github.com/rallyhealth/weePickle/pull/106 uncovered that JsonPointer
memory usage is quadratic over depth. -Xmx8g
is insufficient to create a JsonPointer
for 100k opening arrays without an OutOfMemoryError
.
JsonPointer
seems to be a linked list where each node contains an _asString
field containing the full path to that node.
Minimal test to repro the issue: 3764ff4b1a8dc2090dec57da5dc0cc2c7f8f520b
// such as https://github.com/nst/JSONTestSuite/blob/master/test_parsing/n_structure_100000_opening_arrays.json
public void testDeepJsonPointer() throws Exception {
int DEPTH = 100000;
String INPUT = new String(new char[DEPTH]).replace("\0", "[");
JsonParser parser = createParser(MODE_READER, INPUT);
try {
while (true) {
parser.nextToken();
}
} catch (Exception e) {
JsonStreamContext parsingContext = parser.getParsingContext();
JsonPointer jsonPointer = parsingContext.pathAsPointer(); // OOME
String pointer = jsonPointer.toString();
String expected = new String(new char[DEPTH - 1]).replace("\0", "/0");
assertEquals(expected, pointer);
}
}
Interesting. I will have to look into this a bit. My immediate thinking is that this might be an unfortunate side effect of attempting to retain lazily computed description for JsonPointer
for common (if wasteful) use cases where JsonPointer.toString()
is called multiple times.
But it sounds like here it is more directly called as a side effect, causing every single pointer "node" to create and retain its full path description.
Hmmmh. Unfortunately, looking at code involved, it is not the way I thought -- actual full path is eagerly constructed. This makes changing a whole lot more difficult and probably only doable for a new minor version (2.14). Not sure if and how to address this, need to think a bit more. :-/
Can I make a case for Caffeine cache? The cache could be configured to hold onto a max number of pointer strings and to remove others on a LRU basis.
If jackson were to buy into real LRU behaviour with its caches. The LRUMap could be rewritten to use a similar caffeine cache - like the one I support at https://github.com/pjfanning/jackson-caffeine-cache
@pjfanning If we needed cache Caffeine is great. But in this case unfortunately caches would not really work, I think. Problem is rather that of how to remove eager construction and retaining of String representation while still allowing efficient operation for common usage.
Caches are highly problematic for this usage due to various reasons, including lack of clear ownership (JsonPointer
instances should not have references back to any larger retained objects, all caches need to be owned by life-cycled entities like either JsonFactory
or ObjectMapper
and so on).
Quick note: still thinking about how to tackle this problem. No solution yet, but one very preliminary idea is around a limit for "_asString" such that existing processing would only be applied for shorter paths (either by length of path representation or number of segments; whichever is easier to track), and different, more dynamic (and less pre-processing) approach was taken beyond that initial set. I don't have a good idea of how to implement that, starting from what exists; my specific concern is that this might require some sort of back-reference to allow dynamic generation of full path. Such references might be just fine, as long as things are kept immutable, but require some care (mutable ones would be just asking for trouble and within strict no-fly zone for me personally :) ).
Note: marking as performance related since that label covers efficiency aspects too (excessive memory usage).
Actually I have a bit more specific idea to consider: when constructing an instance, create full path for the deepest level -- but then for parent levels have reference to full path and length (from beginning), "leftstr". New _length
field would then be used so that:
- If length 1 or larger, create sub-string when needed
- Otherwise use String
This would avoid eagerly allocating N String representations upon construction. It would not necessarily be optimal for use cases where JsonPointer
instances were retained, but it would at least reduce complexity by one dimension (from quadratic to linear).
I'll see if I can implement something along these lines: it seems doable without sacrificing backwards compatibility, and doable for 2.14 (but just to play it safe probably not for 2.13).
Finally found time to fix this; will be in 2.14.0 final.