jqr
jqr copied to clipboard
protection stack overflow errors on medium(?)-sized inputs
When using jqr on largish inputs, I'm finding frequent protection stack overflow
errors.
Here's a simple reprex with 100,000 input strings (on my machine):
> foo <- replicate(100e3, sprintf('{"a":"A","b":"B","c":"C"}')) %>%
jqr::jq("{a,b,c}")
Error: protect(): protection stack overflow
With 10,000 inputs, no error:
> replicate(10e3, sprintf('{"a":"A","b":"B","c":"C"}')) %>%
jqr::jq("{a,b,c}") %>%
str()
'jqson' chr [1:10000] "{\"a\":\"A\",\"b\":\"B\",\"c\":\"C\"}" "{\"a\":\"A\",\"b\":\"B\",\"c\":\"C\"}" "{\"a\":\"A\",\"b\":\"B\",\"c\":\"C\"}" ...
Any ideas?
Session Info
Session info ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
setting value
version R version 3.5.1 (2018-07-02)
system x86_64, darwin16.7.0
ui unknown
language (EN)
collate en_US.UTF-8
tz America/Denver
date 2019-01-18
Packages -------------------------------------------------------------------------------------------------------------------------------------------------------------------------
package * version date source
assertthat 0.2.0 2017-04-11 CRAN (R 3.5.1)
backports 1.1.2 2017-12-13 CRAN (R 3.5.1)
base * 3.5.1 2018-07-03 local
bindr 0.1.1 2018-03-13 CRAN (R 3.5.1)
bindrcpp 0.2.2 2018-03-29 CRAN (R 3.5.1)
broom 0.5.0 2018-07-17 CRAN (R 3.5.1)
cellranger 1.1.0 2016-07-27 CRAN (R 3.5.1)
cli 1.0.0 2017-11-05 CRAN (R 3.5.1)
colorspace 1.3-2 2016-12-14 CRAN (R 3.5.1)
compiler 3.5.1 2018-07-03 local
craftsy.utils 1.3.1 2018-08-29 local
crayon 1.3.4 2017-09-16 CRAN (R 3.5.1)
datasets * 3.5.1 2018-07-03 local
devtools 1.13.6 2018-06-27 CRAN (R 3.5.1)
digest 0.6.18 2018-10-10 cran (@0.6.18)
dplyr * 0.7.6 2018-06-29 CRAN (R 3.5.1)
forcats * 0.3.0 2018-02-19 CRAN (R 3.5.1)
ggplot2 * 3.0.0 2018-07-03 CRAN (R 3.5.1)
glue 1.3.0 2018-07-17 CRAN (R 3.5.1)
graphics * 3.5.1 2018-07-03 local
grDevices * 3.5.1 2018-07-03 local
grid 3.5.1 2018-07-03 local
gtable 0.2.0 2016-02-26 CRAN (R 3.5.1)
haven 1.1.2 2018-06-27 CRAN (R 3.5.1)
hms 0.4.2 2018-03-10 CRAN (R 3.5.1)
httr 1.3.1 2017-08-20 CRAN (R 3.5.1)
jqr 1.1.0.9100 2019-01-18 Github (ropensci/jqr@4a43703)
jsonlite 1.5 2017-06-01 CRAN (R 3.5.1)
lattice 0.20-35 2017-03-25 CRAN (R 3.5.1)
lazyeval 0.2.1 2017-10-29 CRAN (R 3.5.1)
lubridate 1.7.4 2018-04-11 CRAN (R 3.5.1)
magrittr * 1.5 2014-11-22 CRAN (R 3.5.1)
memoise 1.1.0 2017-04-21 CRAN (R 3.5.1)
methods * 3.5.1 2018-07-03 local
modelr 0.1.2 2018-05-11 CRAN (R 3.5.1)
munsell 0.5.0 2018-06-12 CRAN (R 3.5.1)
nlme 3.1-137 2018-04-07 CRAN (R 3.5.1)
pillar 1.3.0 2018-07-14 CRAN (R 3.5.1)
pkgconfig 2.0.2 2018-08-16 CRAN (R 3.5.1)
plyr 1.8.4 2016-06-08 CRAN (R 3.5.1)
purrr * 0.2.5 2018-05-29 CRAN (R 3.5.1)
R6 2.2.2 2017-06-17 CRAN (R 3.5.1)
Rcpp 0.12.18 2018-07-23 CRAN (R 3.5.1)
readr * 1.1.1 2017-05-16 CRAN (R 3.5.1)
readxl 1.1.0 2018-04-20 CRAN (R 3.5.1)
rlang 0.2.2 2018-08-16 CRAN (R 3.5.1)
rstudioapi 0.7 2017-09-07 CRAN (R 3.5.1)
rvest 0.3.2 2016-06-17 CRAN (R 3.5.1)
scales 1.0.0 2018-08-09 CRAN (R 3.5.1)
stats * 3.5.1 2018-07-03 local
stringi 1.2.4 2018-07-20 CRAN (R 3.5.1)
stringr * 1.3.1 2018-05-10 CRAN (R 3.5.1)
tibble * 1.4.2 2018-01-22 CRAN (R 3.5.1)
tidyr * 0.8.1 2018-05-18 CRAN (R 3.5.1)
tidyselect 0.2.4 2018-02-26 CRAN (R 3.5.1)
tidyverse * 1.2.1 2017-11-14 CRAN (R 3.5.1)
tools 3.5.1 2018-07-03 local
utils * 3.5.1 2018-07-03 local
withr 2.1.2 2018-03-15 CRAN (R 3.5.1)
xml2 1.2.0 2018-01-24 CRAN (R 3.5.1)
Thanks for the report @mmuurr
We came upon this recently, see https://github.com/ropensci/geojson/issues/36
The answer is essentially that you're pushing too much data in at once, so try to push in smaller chunks if possible. Is it possible in your case?
@jeroen With this example that 100K length JSON works fine with jq on the cli, so is there anything we can do change this? If not, maybe we can help users split up json into chunks and then re-combine. Would work if it's like in the example above where each element in a vector is valid JSON, but not so easy otherwise
@sckott, yeah I'm accommodating for now by chunking inputs (e.g. via readr::read_lines_chunked
), but I thought I'd just raise the issue for awareness :-)
glad you can break it up. we'll see what Jeroen says.
Hi there, just a polite re-surfacing of this issue, which I've run into again. I think breaking up long JSON (character) vectors into chunks works just great and designing a simple wrapper to do so that then recombines the jqr results is indeed relatively easy. I'm wondering if there's:
- Any guidance on what that appropriate chunk size would be in number of strings (i.e. length of vector) and/or
- If the chunking should be determined by total byte size of the chunks, which adds some (albeit small) complexity to the wrappers.
Also should such a wrapper be integrated directly into jqr? (If so, I'd be happy to take a first stab at that wrapper and create a PR, though I'll pass on that effort if y'all don't believe it should be part of the package).
And if no chunking wrapper built-in, should jqr catch that specific type of error and update the user with 'advice' (i.e. "hey user, try chunking")?
Thanks @mmuurr - sorry for the delay on this.
That makes sense that it's 10K, since
https://github.com/stedolan/jq/blob/9b51a0852a0f91fbc987f5f2b302ff65e22f6399/src/parser.c#L1692 via https://github.com/stedolan/jq/issues/1054 and https://github.com/stedolan/jq/issues/1041
I think a wrapper belongs here in the package.
byte size does seem like it would be more appropriate.
can you send a PR and we can discuss from there
I am also experiencing this error when I feed more than 50k json strings into jq to process. I can chunk the data, of course, but is a bit disruptive in my example.