jackson-core icon indicating copy to clipboard operation
jackson-core copied to clipboard

Add configurable "document/processing limits" for JSON parser

Open cowtowncoder opened this issue 5 years ago • 2 comments

(note: related to/inspired by https://github.com/FasterXML/jackson-databind/issues/2816)

Some aspects of input document are prone to possible abuse, so that malicious sender can create specifically crafted documents to try to overload server. This includes things like:

  1. Ridiculously deeply nested documents (it only takes 2 characters to create Array context so 8k document can have 4000 levels of nesting)
  2. Very long documents (in general): while streaming decoder has no problems with length per se, higher level databind can easily exhaust memory even with somewhat normal amplification (that is, input content of 10 megabytes can result in a data structure of 100 megabytes)
  3. Very long names: Jackson's name decoding is optimized for short names and while it can handle any length (within bounds of total memory available) performance characteristics are not great beyond, say, couple of thousands of characters
  4. Very long numbers: textual length of tens of thousands of numbers might be problematic for some uses
  5. (possibly?) Huge number of properties per JSON Object -- for some use cases construction of data structures with tens or hundreds of thousands of distinct keys can be problematic
  6. (possibly?) Extra long text segments (megabytes of content) can similarly become problematic -- this could also result from broken encoding (missing closing quote)

and although streaming parser can typically handle many of these cases quite well, they can be very problematic for higher-level processing -- and even for streaming, for highly parallel processing.

So. It would be good to create a configurable set of options that:

  1. Default to same safe set of limits for likely problematic cases (like limit nesting to what is known to typically fit in wrt stack frames; limit maximum property names)
  2. Leave more speculative limits (text length) to unlimited (or very high)
  3. Offer a simple way to configure limits (possibly only per JsonFactory, although it'd be really nice if per-parser overrides were possible)

Further reading: related material.

Here are some relevant links:

  • https://github.com/zio/zio-json#security outlines some general problem areas (overlaps with some of points above)

cowtowncoder avatar Aug 23 '20 03:08 cowtowncoder

Since 2.13 has been released marking as 2.14 (earliest) due to need for API additions and possible compatibility concerns for some maximum limits.

Also: I think this issue is still important even if databind tackles some of more immediate concerns (via https://github.com/FasterXML/jackson-databind/issues/2816).

cowtowncoder avatar Feb 13 '22 22:02 cowtowncoder

Some thinking out aloud:

  • Limits for total input document size can probably be checked at points when input blocks are read (blocking) or fed (async): this is what Woodstox does and appears to work beautifully with minimal overhead. Limits probably need to be in units of "whatever input unit is, byte/char"
  • Limit for nesting is a much bigger challenge but my first instinct is to integrate this to JsonStreamContext/JsonReadContext (2.x) (TokenStreamContext in 3.0) -- that is, check it when trying to add next context. This can get tricky due to inability to throw checked (in 2.x) JsonProcessingException but we'll cross that bridge if and when we get that far.... there's also the obvious challenge of how to pass limit configuration in the first place.

As to configuration settings, there probably needs to be a shared base type (StreamProcessingLimits), as concrete implementation, and with a singleton "default" or "empty" instance to feed in case where no explicit configuration is used. And then with format-specific extensions allowed. Should probably use Builder-style construction, limits instances fully immutable (obviously) to be shareable.

One immediate concern I have, however, is that of adding state to JsonReadContext: seems like we need both current level (int) and configuration settings; latter to be passed when creating "root" instance, then passed to children. Alternative would be to require config to be passed on every "createXxx()" call -- this would prevent need to hold on to a reference (save 1 field) but would be much more intrusive as callers would need to be changed. So probably better to add 2 fields.

cowtowncoder avatar Feb 14 '22 00:02 cowtowncoder

Will not happen for 2.14 but should be the first thing for 2.15.

cowtowncoder avatar Oct 23 '22 22:10 cowtowncoder

Implemented for inclusion in 2.15:

  • Maximum number lengths: #827
  • Maximum String value length: #863

cowtowncoder avatar Feb 14 '23 05:02 cowtowncoder

@cowtowncoder are there any more of these that you want to do for 2.15?

pjfanning avatar Feb 22 '23 13:02 pjfanning

The total size could readily be limited using stream wrappers like those in https://github.com/pjfanning/json-size-limiter/tree/main/src/main/java/com/github/pjfanning/json/util . That code is borrowed. It should also be possible to find ByteBuffer equivalents. There is an argument that many web service frameworks have support for limiting the acceptable size of inputs - and that users should prefer those approaches. The earlier that the large input size is spotted, the better

pjfanning avatar Feb 22 '23 15:02 pjfanning

Yes, ideally we'd be able to target (1) by JsonStreamContext, and (2) for input and output sizes. But I don't know how timing would work.

I think existence of stream wrappers is useful and can work for now even if eventually in-core limits were added.

But deeper nesting checks should definitely be added in core at some point. Just not sure if timing works for 2.15.

And field names... that's an interesting thing. Not commonly reported as attack but conceivably it would be vector same way (... and more) as max String token length.

cowtowncoder avatar Feb 22 '23 18:02 cowtowncoder

It seems like (2) is something that can be left to users. I think it is reasonable for jackson to concentrate on the cases where comparatively fewer bytes in input can lead to bad outcomes.

pjfanning avatar Feb 23 '23 09:02 pjfanning

@pjfanning It could be, altough I implemented this with Woodstox few years back and it really is rather easy to add. Then again Jackson-core supports decorators so it's... even easier to wrap by users.

But I agree it's not a high priority at this point.

cowtowncoder avatar Feb 24 '23 03:02 cowtowncoder

Filed #1046 for max document length -- I think it is worth adding, after thinking it through. Hope to work on it for 2.16.

cowtowncoder avatar Jun 12 '23 22:06 cowtowncoder

Out of 6 ideas, 5 now implemented and included in 2.16.0-rc1; no plans yet for last one (max # of properties per Object); can file a new issue if need be.

Closing this issue as completed.

cowtowncoder avatar Oct 26 '23 17:10 cowtowncoder