jq
jq copied to clipboard
increase depth limit
Describe the bug A long minified json seems to exceed the parser limits.
To Reproduce
I get this error:
parse error: Exceeds depth limit for parsing at line 1, column 4381704
(I can't attach the original file, sorry)
Expected behavior jq works with large files
Environment (please complete the following information):
- OS and Version: Ubuntu 22.04 WLS
- jq version: jq-1.6
The current maximum depth is 256
. What do you propose we set it to? Maybe we should make it configurable on the command-line?
Yeah, I found MAX_PARSING_DEPTH, I agree with the flag, and what about allowing "unlimited"? thus using dynamic memory
Well, our parser doesn't use the call stack for its state, so we could just remove this limit altogether.
Yes, please make max depth configurable. See this relevant StackOverflow post which as gotten 500 views in 6 months: How to increase depth limit of jq?
I think we probably don't even need a depth limit at all. It's not a stack size issue for jq.
I think the original reason for the recursion limit was some security issue IIRC
On Thu, Sep 7, 2023, 22:46 Nico Williams @.***> wrote:
I think we probably don't even need a depth limit at all. It's not a stack size issue for jq.
— Reply to this email directly, view it on GitHub https://github.com/jqlang/jq/issues/2846#issuecomment-1710750295, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF77AQI5V3LQUPKXBJGWDNDXZIXBFANCNFSM6AAAAAA3RPZMVY . You are receiving this because you commented.Message ID: @.***>
I think the original reason for the recursion limit was some security issue IIRC
Typically it's to avoid stack exhaustion. But if you're not depending on the call stack to handle nested data structure parsing, then that's not a concern.
Would be great if one could go arbitrarily deep (as deep as memory allows). Here's the original issue that led to the current limit: https://github.com/jqlang/jq/issues/1136
I would propose at least allowing a limit to be imposed if desired, rather than just doing away with it altogether. Ideally as a run-time option, but at the very least, a build option.
In many real-world situations the depth should be within a known limit, and if the input is invalid-- especially if the input is coming from, say, a user or a third party and therefore not from a controlled source-- it is much preferable for JQ to fail, than to keep going and unnecessarily use potentially unbounded memory.
Adding option to change MAX_PARSING_DEPTH based on needs will be real helpful
Just recently came across this myself. As a workaround, I've discovered that you can get around this by starting with a streaming input and then inverting this with fromstream
. Note that this means you are not protected against segfaults. Here's a working example.
pathological.py
:
#!/usr/bin/env python3
import sys
def main():
depth = int(sys.argv[1])
for _ in range(depth):
print("[", end="")
for _ in range(depth):
print("]", end="")
if __name__ == "__main__":
main()
> jq --version
jq-1.7.1
Depth check prevents the "regular' path from running:
> ./pathological.py 1000 | jq '.'
jq: parse error: Exceeds depth limit for parsing at line 1, column 257
Using --stream
lets you work around this limit:
> ./pathological.py 1000 | jq --stream -n 'fromstream(inputs)'
<snip>
Be careful when using this workaround, as it exposes you once again to the possibility of segfaults:
> ./pathological.py 100000 | jq --stream -n 'fromstream(inputs)'
Traceback (most recent call last):
File "/private/tmp/./pathological.py", line 13, in <module>
main()
File "/private/tmp/./pathological.py", line 10, in main
print("]", end="")
BrokenPipeError: [Errno 32] Broken pipe
Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
BrokenPipeError: [Errno 32] Broken pipe
fish: Process 79538, 'jq' from job 1, './pathological.py 100000 | jq -…' terminated by signal SIGSEGV (Address boundary error)
I discovered this "accidentally", as my original intention to use streaming was to use it to preprocess the input and strip out deeply nested values. It seems like the real issue here is that jq does not currently have any explicit protection against segfaults in this case. Whether or not the user has requested a greater depth limit, jq should exit cleanly on such an error.
https://github.com/itchyny/gojq handles this "gracefully". Interestingly, on my machine, I can easily parse the 100000-deep array, but doing much more than that results in a stack overflow (handled by the go runtime rather than segfaulting).
Note that I'm not personally concerned about the security of this issue with respect to untrusted input, since it relies on the jq user going into streaming mode and then blindly using fromstream
on a large/deep input. Presumably you know what you're doing if you're going for the streaming escape hatch (and are also aware that you're working with large/pathological input). On the other hand, this could be an issue if untrusted input is used to construct the jq command.