jq icon indicating copy to clipboard operation
jq copied to clipboard

increase depth limit

Open dgutson opened this issue 1 year ago • 13 comments

Describe the bug A long minified json seems to exceed the parser limits.

To Reproduce I get this error: parse error: Exceeds depth limit for parsing at line 1, column 4381704 (I can't attach the original file, sorry)

Expected behavior jq works with large files

Environment (please complete the following information):

  • OS and Version: Ubuntu 22.04 WLS
  • jq version: jq-1.6

dgutson avatar Aug 15 '23 19:08 dgutson

The current maximum depth is 256. What do you propose we set it to? Maybe we should make it configurable on the command-line?

nicowilliams avatar Aug 15 '23 20:08 nicowilliams

Yeah, I found MAX_PARSING_DEPTH, I agree with the flag, and what about allowing "unlimited"? thus using dynamic memory

dgutson avatar Aug 15 '23 21:08 dgutson

Well, our parser doesn't use the call stack for its state, so we could just remove this limit altogether.

nicowilliams avatar Aug 15 '23 21:08 nicowilliams

Yes, please make max depth configurable. See this relevant StackOverflow post which as gotten 500 views in 6 months: How to increase depth limit of jq?

corneliusroemer avatar Sep 07 '23 19:09 corneliusroemer

I think we probably don't even need a depth limit at all. It's not a stack size issue for jq.

nicowilliams avatar Sep 07 '23 20:09 nicowilliams

I think the original reason for the recursion limit was some security issue IIRC

On Thu, Sep 7, 2023, 22:46 Nico Williams @.***> wrote:

I think we probably don't even need a depth limit at all. It's not a stack size issue for jq.

— Reply to this email directly, view it on GitHub https://github.com/jqlang/jq/issues/2846#issuecomment-1710750295, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF77AQI5V3LQUPKXBJGWDNDXZIXBFANCNFSM6AAAAAA3RPZMVY . You are receiving this because you commented.Message ID: @.***>

corneliusroemer avatar Sep 07 '23 22:09 corneliusroemer

I think the original reason for the recursion limit was some security issue IIRC

Typically it's to avoid stack exhaustion. But if you're not depending on the call stack to handle nested data structure parsing, then that's not a concern.

nicowilliams avatar Sep 07 '23 22:09 nicowilliams

Would be great if one could go arbitrarily deep (as deep as memory allows). Here's the original issue that led to the current limit: https://github.com/jqlang/jq/issues/1136

corneliusroemer avatar Sep 07 '23 23:09 corneliusroemer

I would propose at least allowing a limit to be imposed if desired, rather than just doing away with it altogether. Ideally as a run-time option, but at the very least, a build option.

In many real-world situations the depth should be within a known limit, and if the input is invalid-- especially if the input is coming from, say, a user or a third party and therefore not from a controlled source-- it is much preferable for JQ to fail, than to keep going and unnecessarily use potentially unbounded memory.

liquidaty avatar Dec 08 '23 22:12 liquidaty

Adding option to change MAX_PARSING_DEPTH based on needs will be real helpful

ashokz-git avatar Feb 09 '24 15:02 ashokz-git

Just recently came across this myself. As a workaround, I've discovered that you can get around this by starting with a streaming input and then inverting this with fromstream. Note that this means you are not protected against segfaults. Here's a working example.

pathological.py:

#!/usr/bin/env python3

import sys

def main():
    depth = int(sys.argv[1])
    for _ in range(depth):
        print("[", end="")
    for _ in range(depth):
        print("]", end="")

if __name__ == "__main__":
    main()
> jq --version
jq-1.7.1

Depth check prevents the "regular' path from running:

> ./pathological.py 1000 | jq '.'
jq: parse error: Exceeds depth limit for parsing at line 1, column 257

Using --stream lets you work around this limit:

> ./pathological.py 1000 | jq --stream -n 'fromstream(inputs)'
<snip>

Be careful when using this workaround, as it exposes you once again to the possibility of segfaults:

> ./pathological.py 100000 | jq --stream -n 'fromstream(inputs)'
Traceback (most recent call last):
  File "/private/tmp/./pathological.py", line 13, in <module>
    main()
  File "/private/tmp/./pathological.py", line 10, in main
    print("]", end="")
BrokenPipeError: [Errno 32] Broken pipe
Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
BrokenPipeError: [Errno 32] Broken pipe
fish: Process 79538, 'jq' from job 1, './pathological.py 100000 | jq -…' terminated by signal SIGSEGV (Address boundary error)

I discovered this "accidentally", as my original intention to use streaming was to use it to preprocess the input and strip out deeply nested values. It seems like the real issue here is that jq does not currently have any explicit protection against segfaults in this case. Whether or not the user has requested a greater depth limit, jq should exit cleanly on such an error.

bsidhom avatar Apr 25 '24 00:04 bsidhom

https://github.com/itchyny/gojq handles this "gracefully". Interestingly, on my machine, I can easily parse the 100000-deep array, but doing much more than that results in a stack overflow (handled by the go runtime rather than segfaulting).

bsidhom avatar Apr 25 '24 01:04 bsidhom

Note that I'm not personally concerned about the security of this issue with respect to untrusted input, since it relies on the jq user going into streaming mode and then blindly using fromstream on a large/deep input. Presumably you know what you're doing if you're going for the streaming escape hatch (and are also aware that you're working with large/pathological input). On the other hand, this could be an issue if untrusted input is used to construct the jq command.

bsidhom avatar Apr 25 '24 01:04 bsidhom