jsonlint icon indicating copy to clipboard operation
jsonlint copied to clipboard

JSON is too deeply nested (SystemStackError)

Open majormoses opened this issue 7 years ago • 27 comments

I am having an issue and I can only reproduce it in my jenkins instance so I assume its something that is related to the system, ruby, and gems installed. Here is the relevant output from my jenkins job with some extra information such as ruby version and gems added for debug purposes.

https://gist.github.com/majormoses/721d610a1a11c0ffde9e1a1aa594cba1

Here is the simple json that was used to test: https://gist.github.com/majormoses/e3245bf85ec2b1d1248ea2159f6b11f0

Any useful insight as where to look would be greatly appreciated.

majormoses avatar Feb 07 '17 13:02 majormoses

This might be a bug on the OJ side. Two like it have been reported over the years: https://github.com/ohler55/oj/issues/50 https://github.com/ohler55/oj/issues/69

shortdudey123 avatar Feb 07 '17 17:02 shortdudey123

@shortdudey123 https://github.com/ohler55/oj/issues/333#issuecomment-279038941 where would you like to start?

majormoses avatar Feb 10 '17 19:02 majormoses

The referenced OJ issue have nothing to do with this. They are old and not even for the same code used for the SAJ parser. Completely a red herring.

Can someone provide me with the handler used for the SAJ parser and the JSON being parsed? Is it the same short one in the gist? https://gist.github.com/majormoses/e3245bf85ec2b1d1248ea2159f6b11f0

ohler55 avatar Feb 10 '17 19:02 ohler55

Looks like you were faster on the comment. :-)

ohler55 avatar Feb 10 '17 19:02 ohler55

Here is a simple script I tried. It ran with no problems locally. Is it representative of the use in this issue?

require 'stringio'
require 'oj'
require 'jsonlint'

json = %|{
  "id": "babrams",
  "ssh_keys": [
    "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDMCKMvMJ2yb13pHLsIPqi2xHOBKFZKa8+FM1FUqNbIxKeq3LLw+4WiXLK60DxxS6hmOnXD+FcWNaykkGLyGQeYHxsHynsXo9BPaG/ewaAp5SDU/zAIAaex15s/zvo5l+5Pq9OwXYtFRmfezk3ImCx7SZ8sMmHiFHYD8d38XBlGX53kLSFm5HLEopEvSCRTUyTj+tPIspgYR6IvCTdXnamO9FT8Rkeqw+mqjX9sVTaLuuqwQZlRFRMslrrJbSfv+7XvyKsjOsmAlkEYRlpHbUCxUh2Hc5q2Wfm+acOHPkkUPX8kLeT2vW+Bd/9LlPi9BN0dbmazGPbf5kv02MRNQNeUrdRfdzRIOG4tUEv154msF7QdEuy9W4pv9p0z2rNOqOQEw9HPhMiAkftIVGnvvGRj9+jIARIVzV5gAfVm2DQbPJClr0tGNCfzHmndt6FddawubXFPvFNrKgdC38Ts0Jzl1F3aWGHT8UyURDbezrTGpxg+Cqq4YUXIZfrrqB8nzF8qK3eMW2Tcxdy2m+fFBzQeHlozBSP55dcdjekdQrcVcwkYux4jecJ9BU++DjWtMtY93LgVL5BnHixS4ybo7loCndYkpsI6ZZm9oLVxHsjeoaM9D9iYoN28LIlALBm/dnfCh92G/H40v/X25DMIvRqcfnE31gsOCJ85A29twSC+Cw== [email protected]",
    "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQCy3cbPekJYHAIa8J1fOr2iIqpx/7pl4giJYAG7HCfsunRRUq3dY1KhVw1BlmMGIDzNwcuNVIfBS5HS/wREqbHMXxbwAjWrMwUofWd09CTuKJZiyTLUC5pSQWKDXZrefH/Fwd7s+YKk1s78b49zkyDcHSnKxjN+5veinzeU+vaUF9duFAJ9OsL7kTDEzOUU0zJdSdUV0hH1lnljnvk8kXHLFl9sKS3iM2LRqW4B6wOc2RbXUnx+jwNaBsq1zd73F2q3Ta7GXdtW/q4oDYl3s72oW4ySL6TZfpLCiv/7txHicZiY1eqc591CON0k/Rh7eR7XsphwkUstoUPQcBuLqQPA529zBigD7A8PBmeHISxL2qirWjR2+PrEGn1b0yu8IHHz9ZgliX83Q4WpjXvJ3REj2jfM8hiFRV3lA/ovjQrmLLV8WUAZ8updcLE5mbhZzIsC4U/HKIJS02zoggHGHZauClwwcdBtIJnJqtP803yKNPO2sDudTpvEi8GZ8n6jSXo/N8nBVId2LZa5YY/g/v5kH0akn+/E3jXhw4CICNW8yICpeJO8dGYMOp3Bs9/cRK8QYomXqgpoFlvkgzT2h4Ie6lyRgNv5QnUyAnW43O5FdBnPk/XZ3LA462VU3uOfr0AQtEJzPccpFC6OCFYWdGwZQA/r1EZQES0yRfJLpx+uZQ== babrams@babrams-Serval-WS"
  ],
  "htpasswd": "$1$i2xUX9a4$6LwYbCk4K6JErTDdaiZy50",
  "groups": [
    "devops"
  ],
  "shell": "\/bin\/bash",
  "comment": "Ben Abrams"
}|

linter = JsonLint::Linter.new
linter.check_stream(StringIO.new(json))

puts "errors: #{linter.errors}"

ohler55 avatar Feb 10 '17 19:02 ohler55

@majormoses Sorry to hear you're running into issues.

I tried reproducing this issue under OS X 10.12 with the same Ruby version, bundler version, and set of Gem you're using in your Gemfile, but the command completes successfully.

I also tried reproducing in a trusty64 virtual machine, but that also completed successfully.

It seems like some detail is missing here. I put a script in a gist that might help collect some forensics. Mind running that on your jenkins instance and sending the output over?

https://gist.github.com/dougbarth/84e6ccc3825d92aaad61fc2ab4e7fd59

Reproducing this in a clean environment (preferably in a VM) might be possible with guidance from that output.

dougbarth avatar Feb 10 '17 20:02 dougbarth

@dougbarth https://gist.github.com/majormoses/721d610a1a11c0ffde9e1a1aa594cba1#file-ruby_details

majormoses avatar Feb 13 '17 22:02 majormoses

What is the status on this? Anyone?

ohler55 avatar Feb 20 '17 03:02 ohler55

waiting for further troubleshooting advice...

majormoses avatar Feb 20 '17 16:02 majormoses

@dougbarth ping any thoughts?

majormoses avatar Apr 14 '17 18:04 majormoses

@dougbarth @ohler55 I tried messing again with this on our jenkins instance and after playing around I found something curious it only is re-produceable with Make (though I can not replicate locally sadly). here is a gist that might maybe provide some more insight: https://gist.github.com/majormoses/a83a68426af45931adc0f5e0d466c305

majormoses avatar Apr 14 '17 19:04 majormoses

so I think I found it! so it actually was the stacksize...after bumping this it worked I updated the comments on the gist to reflect the process...My guess is the reason I did not see this locally is that I have a newer version of make that likely more efficient?

majormoses avatar Apr 14 '17 20:04 majormoses

Must be a very small stack size. The JSON is not very deep at all. Maybe the Jenkins VM or machine is small.

ohler55 avatar Apr 14 '17 20:04 ohler55

it was 8192 and bumped it to 16384

majormoses avatar Apr 14 '17 20:04 majormoses

I still think there is a bug but its likely within that version of make itself.

majormoses avatar Apr 14 '17 20:04 majormoses

It that in bytes? If so that is more like an embedded processor setting.

ohler55 avatar Apr 14 '17 20:04 ohler55

yes thats in bytes.

majormoses avatar Apr 14 '17 20:04 majormoses

Wow, that is extreme. Glad you found the issue.

ohler55 avatar Apr 14 '17 20:04 ohler55

ya I dont like this "fix" at all, very few times should an application need such level of recursion...but hopefully we will be able to get onto a newer version of ubuntu soon and will see if we can replicate with a newer version of make. Do you think we should open new issues to do some profiling to see if we can turn anything up?

majormoses avatar Apr 14 '17 20:04 majormoses

With the json you provided the stack depth is not deep at all but on a 64bit machine 8192 byte does not give you very many pointers and variables. A stack allocation of 4096 for a buffer uses up half the stack. 4096 is a single page. Usually the stack size is measured in MB.

ohler55 avatar Apr 14 '17 20:04 ohler55

@ohler55 sorry I misspoke that is in KB.

majormoses avatar Apr 14 '17 21:04 majormoses

$ ulimit -a | grep stack
stack size              (kbytes, -s) 8192

majormoses avatar Apr 14 '17 21:04 majormoses

Glad to hear you found a lead on the issue. I tried recreating the problem myself in a Docker container: https://gist.github.com/dougbarth/160de6ac13120103bfb1bd505901f6e1

Note: I'm not using the exact same Ruby version, but it's Ruby 2.2 and is using the same set of gems

At dramatically smaller stack sizes than you're using (failures start around 38KB), the program eventually fails with a SystemStackError, but I can't get it to fail at the call to Oj.saj_parse.

8192 seems to be the default stack size limit, so I'm not sure why you're running into this issue at that size.

It seems like this issue must be specific to something on your Jenkins server.

dougbarth avatar Apr 14 '17 22:04 dougbarth

Ya it feels very wrong to me to increase it (at least in this case) but I can not find anything on the Jenkins node pointing to anything enlightening. I will try to spend some time next week if I can spare it to try digging deeper.

@dougbarth could you verify what version of make you tested with? So far that is the only thing that makes even a shred of sense to me is that there is a bug in the version of Make on our Jenkins node that effects this in a very odd way.

majormoses avatar Apr 14 '17 22:04 majormoses

Looks to be the same version as your Jenkins server.

root@506320a3eb4e:/jsonlint# make --version
GNU Make 3.81
Copyright (C) 2006  Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.

This program built for x86_64-pc-linux-gnu

dougbarth avatar Apr 14 '17 22:04 dougbarth

hmm that kills my theory...

majormoses avatar Apr 14 '17 22:04 majormoses

So I think I figured out what was going on and I don't think it was caused directly by this gem. It has to do with how our Makefile was structured. I was able to reproduce it on travis as well so it was not longer a wonky jenkins in question. Basically every time you call a target from another target it spawns another make process. It looked something like this:

rubocop:
  # do my rubocop things
 foodcritic:
  # do my foodcritic things
chefjsonlint:
  # do my json linting of chef objects
ci: rubocop foodcritic chefjsonlint

My guess is that by the time it gets through spawning all those make processes makes it's stack size at that point too close to normal/sane limits before running it. When I have some more time later I will try to profile it to better understand how much is being used by what.

majormoses avatar Nov 05 '17 05:11 majormoses