WDI icon indicating copy to clipboard operation
WDI copied to clipboard

`jsonlite` vs. `RJSONIO`

Open vincentarelbundock opened this issue 2 years ago • 2 comments

@BPJandree recommended and tested the new version of WDI which relies on jsonlite instead of RJSONIO.

Things seem to work OK, but I have just noticed that the new version is about 2x slower than the old one.

Do we have actual evidence that jsonlite works better along other dimensions? Fewer network issues?

Is this worth it, or should I roll back the changes?

vincentarelbundock avatar Jul 29 '22 14:07 vincentarelbundock

Could be because jsonlite performs additional validity checks, I recall reading some old discussions on that but I'll let others come in in case they have a more up-to-date view on this. Perhaps an idea is to keep the old get_page function and use either a tryCatch so that if it fails on https it tries the jsonlite version, or add a flag to the WDI() function that allows making the switch but keeps the old rjsonio as default if it's preferred.

BPJandree avatar Jul 29 '22 14:07 BPJandree

Right. We are almost certainly paying for the convenience of a robust handling that returns neat data frames. But we did that faster with our customized list processing.

vincentarelbundock avatar Jul 29 '22 14:07 vincentarelbundock

Note to self: try RcppSimdJSON

vincentarelbundock avatar Aug 20 '22 10:08 vincentarelbundock

interesting package. Quick try on windows:

series_url="https://api.worldbank.org/v2/indicator?per_page=25000&format=json"
> system.time(series_dat <- RcppSimdJson::fload(series_url)[[2]])
   user  system elapsed 
   0.06    0.09    8.53 
> system.time(series_dat <- RcppSimdJson::fload(series_url)[[2]])
   user  system elapsed 
   0.09    0.09    6.97 
> system.time(series_dat <- jsonlite::fromJSON(series_url)[[2]])
   user  system elapsed 
   0.45    0.05    9.25 
> system.time(series_dat <- jsonlite::fromJSON(series_url)[[2]])
   user  system elapsed 
   0.39    0.12    6.04 

Download time is dominating.

I think maybe the C++17 requirement is a bit stringent - eg for a compute heavy package requirements on the compiler make sense, but for a general purpose package the audience may low little about such matters and they might get stuck needing to update. Thoughts?

Be interested to see if you get speed improvements. I'll play around with this package in a few workloads to get a feel.

BPJandree avatar Aug 23 '22 15:08 BPJandree

yep, you're right. I did some tests last week and came to the same conclusion. Let's just use jsonlite since that is implemented already.

vincentarelbundock avatar Aug 23 '22 15:08 vincentarelbundock

I think it has the broadest base of users at the moment, so the slower processing should buy a bit of better maintainability and usability

BPJandree avatar Aug 24 '22 10:08 BPJandree

Thanks, good points. I will close this now and prep a CRAN release in the next few days.

vincentarelbundock avatar Aug 24 '22 20:08 vincentarelbundock