jackson-core icon indicating copy to clipboard operation
jackson-core copied to clipboard

JsonPointer quadratic memory use: OOME on deep inputs

Open htmldoug opened this issue 2 years ago • 6 comments

Adding https://github.com/nst/JSONTestSuite in https://github.com/rallyhealth/weePickle/pull/106 uncovered that JsonPointer memory usage is quadratic over depth. -Xmx8g is insufficient to create a JsonPointer for 100k opening arrays without an OutOfMemoryError.

JsonPointer seems to be a linked list where each node contains an _asString field containing the full path to that node.

Minimal test to repro the issue: 3764ff4b1a8dc2090dec57da5dc0cc2c7f8f520b

    // such as https://github.com/nst/JSONTestSuite/blob/master/test_parsing/n_structure_100000_opening_arrays.json
    public void testDeepJsonPointer() throws Exception {
        int DEPTH = 100000;
        String INPUT = new String(new char[DEPTH]).replace("\0", "[");
        JsonParser parser = createParser(MODE_READER, INPUT);
        try {
            while (true) {
                parser.nextToken();
            }
        } catch (Exception e) {
            JsonStreamContext parsingContext = parser.getParsingContext();
            JsonPointer jsonPointer = parsingContext.pathAsPointer(); // OOME
            String pointer = jsonPointer.toString();
            String expected = new String(new char[DEPTH - 1]).replace("\0", "/0");
            assertEquals(expected, pointer);
        }
    }

heap dump filled with JsonPointer instances, each containing a String of the full path

htmldoug avatar Dec 29 '21 03:12 htmldoug

Interesting. I will have to look into this a bit. My immediate thinking is that this might be an unfortunate side effect of attempting to retain lazily computed description for JsonPointer for common (if wasteful) use cases where JsonPointer.toString() is called multiple times.

But it sounds like here it is more directly called as a side effect, causing every single pointer "node" to create and retain its full path description.

cowtowncoder avatar Dec 29 '21 21:12 cowtowncoder

Hmmmh. Unfortunately, looking at code involved, it is not the way I thought -- actual full path is eagerly constructed. This makes changing a whole lot more difficult and probably only doable for a new minor version (2.14). Not sure if and how to address this, need to think a bit more. :-/

cowtowncoder avatar Jan 03 '22 01:01 cowtowncoder

Can I make a case for Caffeine cache? The cache could be configured to hold onto a max number of pointer strings and to remove others on a LRU basis.

If jackson were to buy into real LRU behaviour with its caches. The LRUMap could be rewritten to use a similar caffeine cache - like the one I support at https://github.com/pjfanning/jackson-caffeine-cache

pjfanning avatar Jan 16 '22 11:01 pjfanning

@pjfanning If we needed cache Caffeine is great. But in this case unfortunately caches would not really work, I think. Problem is rather that of how to remove eager construction and retaining of String representation while still allowing efficient operation for common usage.

Caches are highly problematic for this usage due to various reasons, including lack of clear ownership (JsonPointer instances should not have references back to any larger retained objects, all caches need to be owned by life-cycled entities like either JsonFactory or ObjectMapper and so on).

cowtowncoder avatar Jan 16 '22 17:01 cowtowncoder

Quick note: still thinking about how to tackle this problem. No solution yet, but one very preliminary idea is around a limit for "_asString" such that existing processing would only be applied for shorter paths (either by length of path representation or number of segments; whichever is easier to track), and different, more dynamic (and less pre-processing) approach was taken beyond that initial set. I don't have a good idea of how to implement that, starting from what exists; my specific concern is that this might require some sort of back-reference to allow dynamic generation of full path. Such references might be just fine, as long as things are kept immutable, but require some care (mutable ones would be just asking for trouble and within strict no-fly zone for me personally :) ).

Note: marking as performance related since that label covers efficiency aspects too (excessive memory usage).

cowtowncoder avatar Feb 13 '22 04:02 cowtowncoder

Actually I have a bit more specific idea to consider: when constructing an instance, create full path for the deepest level -- but then for parent levels have reference to full path and length (from beginning), "leftstr". New _length field would then be used so that:

  1. If length 1 or larger, create sub-string when needed
  2. Otherwise use String

This would avoid eagerly allocating N String representations upon construction. It would not necessarily be optimal for use cases where JsonPointer instances were retained, but it would at least reduce complexity by one dimension (from quadratic to linear).

I'll see if I can implement something along these lines: it seems doable without sacrificing backwards compatibility, and doable for 2.14 (but just to play it safe probably not for 2.13).

cowtowncoder avatar Feb 14 '22 00:02 cowtowncoder

Finally found time to fix this; will be in 2.14.0 final.

cowtowncoder avatar Oct 12 '22 20:10 cowtowncoder