jqr icon indicating copy to clipboard operation
jqr copied to clipboard

protection stack overflow errors on medium(?)-sized inputs

Open mmuurr opened this issue 5 years ago • 6 comments

When using jqr on largish inputs, I'm finding frequent protection stack overflow errors. Here's a simple reprex with 100,000 input strings (on my machine):

> foo <- replicate(100e3, sprintf('{"a":"A","b":"B","c":"C"}')) %>%
    jqr::jq("{a,b,c}")
Error: protect(): protection stack overflow

With 10,000 inputs, no error:

> replicate(10e3, sprintf('{"a":"A","b":"B","c":"C"}')) %>%
    jqr::jq("{a,b,c}") %>%
    str()
'jqson' chr [1:10000] "{\"a\":\"A\",\"b\":\"B\",\"c\":\"C\"}" "{\"a\":\"A\",\"b\":\"B\",\"c\":\"C\"}" "{\"a\":\"A\",\"b\":\"B\",\"c\":\"C\"}" ...

Any ideas?

Session Info
 Session info ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
  setting  value
  version  R version 3.5.1 (2018-07-02)
  system   x86_64, darwin16.7.0
  ui       unknown
  language (EN)
  collate  en_US.UTF-8
  tz       America/Denver
  date     2019-01-18

 Packages -------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  package       * version    date       source
  assertthat      0.2.0      2017-04-11 CRAN (R 3.5.1)
  backports       1.1.2      2017-12-13 CRAN (R 3.5.1)
  base          * 3.5.1      2018-07-03 local
  bindr           0.1.1      2018-03-13 CRAN (R 3.5.1)
  bindrcpp        0.2.2      2018-03-29 CRAN (R 3.5.1)
  broom           0.5.0      2018-07-17 CRAN (R 3.5.1)
  cellranger      1.1.0      2016-07-27 CRAN (R 3.5.1)
  cli             1.0.0      2017-11-05 CRAN (R 3.5.1)
  colorspace      1.3-2      2016-12-14 CRAN (R 3.5.1)
  compiler        3.5.1      2018-07-03 local
  craftsy.utils   1.3.1      2018-08-29 local
  crayon          1.3.4      2017-09-16 CRAN (R 3.5.1)
  datasets      * 3.5.1      2018-07-03 local
  devtools        1.13.6     2018-06-27 CRAN (R 3.5.1)
  digest          0.6.18     2018-10-10 cran (@0.6.18)
  dplyr         * 0.7.6      2018-06-29 CRAN (R 3.5.1)
  forcats       * 0.3.0      2018-02-19 CRAN (R 3.5.1)
  ggplot2       * 3.0.0      2018-07-03 CRAN (R 3.5.1)
  glue            1.3.0      2018-07-17 CRAN (R 3.5.1)
  graphics      * 3.5.1      2018-07-03 local
  grDevices     * 3.5.1      2018-07-03 local
  grid            3.5.1      2018-07-03 local
  gtable          0.2.0      2016-02-26 CRAN (R 3.5.1)
  haven           1.1.2      2018-06-27 CRAN (R 3.5.1)
  hms             0.4.2      2018-03-10 CRAN (R 3.5.1)
  httr            1.3.1      2017-08-20 CRAN (R 3.5.1)
  jqr             1.1.0.9100 2019-01-18 Github (ropensci/jqr@4a43703)
  jsonlite        1.5        2017-06-01 CRAN (R 3.5.1)
  lattice         0.20-35    2017-03-25 CRAN (R 3.5.1)
  lazyeval        0.2.1      2017-10-29 CRAN (R 3.5.1)
  lubridate       1.7.4      2018-04-11 CRAN (R 3.5.1)
  magrittr      * 1.5        2014-11-22 CRAN (R 3.5.1)
  memoise         1.1.0      2017-04-21 CRAN (R 3.5.1)
  methods       * 3.5.1      2018-07-03 local
  modelr          0.1.2      2018-05-11 CRAN (R 3.5.1)
  munsell         0.5.0      2018-06-12 CRAN (R 3.5.1)
  nlme            3.1-137    2018-04-07 CRAN (R 3.5.1)
  pillar          1.3.0      2018-07-14 CRAN (R 3.5.1)
  pkgconfig       2.0.2      2018-08-16 CRAN (R 3.5.1)
  plyr            1.8.4      2016-06-08 CRAN (R 3.5.1)
  purrr         * 0.2.5      2018-05-29 CRAN (R 3.5.1)
  R6              2.2.2      2017-06-17 CRAN (R 3.5.1)
  Rcpp            0.12.18    2018-07-23 CRAN (R 3.5.1)
  readr         * 1.1.1      2017-05-16 CRAN (R 3.5.1)
  readxl          1.1.0      2018-04-20 CRAN (R 3.5.1)
  rlang           0.2.2      2018-08-16 CRAN (R 3.5.1)
  rstudioapi      0.7        2017-09-07 CRAN (R 3.5.1)
  rvest           0.3.2      2016-06-17 CRAN (R 3.5.1)
  scales          1.0.0      2018-08-09 CRAN (R 3.5.1)
  stats         * 3.5.1      2018-07-03 local
  stringi         1.2.4      2018-07-20 CRAN (R 3.5.1)
  stringr       * 1.3.1      2018-05-10 CRAN (R 3.5.1)
  tibble        * 1.4.2      2018-01-22 CRAN (R 3.5.1)
  tidyr         * 0.8.1      2018-05-18 CRAN (R 3.5.1)
  tidyselect      0.2.4      2018-02-26 CRAN (R 3.5.1)
  tidyverse     * 1.2.1      2017-11-14 CRAN (R 3.5.1)
  tools           3.5.1      2018-07-03 local
  utils         * 3.5.1      2018-07-03 local
  withr           2.1.2      2018-03-15 CRAN (R 3.5.1)
  xml2            1.2.0      2018-01-24 CRAN (R 3.5.1)

mmuurr avatar Jan 18 '19 21:01 mmuurr

Thanks for the report @mmuurr

We came upon this recently, see https://github.com/ropensci/geojson/issues/36

The answer is essentially that you're pushing too much data in at once, so try to push in smaller chunks if possible. Is it possible in your case?

@jeroen With this example that 100K length JSON works fine with jq on the cli, so is there anything we can do change this? If not, maybe we can help users split up json into chunks and then re-combine. Would work if it's like in the example above where each element in a vector is valid JSON, but not so easy otherwise

sckott avatar Jan 18 '19 21:01 sckott

@sckott, yeah I'm accommodating for now by chunking inputs (e.g. via readr::read_lines_chunked), but I thought I'd just raise the issue for awareness :-)

mmuurr avatar Jan 18 '19 23:01 mmuurr

glad you can break it up. we'll see what Jeroen says.

sckott avatar Jan 18 '19 23:01 sckott

Hi there, just a polite re-surfacing of this issue, which I've run into again. I think breaking up long JSON (character) vectors into chunks works just great and designing a simple wrapper to do so that then recombines the jqr results is indeed relatively easy. I'm wondering if there's:

  1. Any guidance on what that appropriate chunk size would be in number of strings (i.e. length of vector) and/or
  2. If the chunking should be determined by total byte size of the chunks, which adds some (albeit small) complexity to the wrappers.

Also should such a wrapper be integrated directly into jqr? (If so, I'd be happy to take a first stab at that wrapper and create a PR, though I'll pass on that effort if y'all don't believe it should be part of the package).

And if no chunking wrapper built-in, should jqr catch that specific type of error and update the user with 'advice' (i.e. "hey user, try chunking")?

mmuurr avatar Sep 19 '20 05:09 mmuurr

Thanks @mmuurr - sorry for the delay on this.

That makes sense that it's 10K, since
https://github.com/stedolan/jq/blob/9b51a0852a0f91fbc987f5f2b302ff65e22f6399/src/parser.c#L1692 via https://github.com/stedolan/jq/issues/1054 and https://github.com/stedolan/jq/issues/1041

I think a wrapper belongs here in the package.

byte size does seem like it would be more appropriate.

can you send a PR and we can discuss from there

sckott avatar Sep 22 '20 17:09 sckott

I am also experiencing this error when I feed more than 50k json strings into jq to process. I can chunk the data, of course, but is a bit disruptive in my example.

DataStrategist avatar Dec 06 '23 14:12 DataStrategist