WDI
WDI copied to clipboard
`jsonlite` vs. `RJSONIO`
@BPJandree recommended and tested the new version of WDI
which relies on jsonlite
instead of RJSONIO
.
Things seem to work OK, but I have just noticed that the new version is about 2x slower than the old one.
Do we have actual evidence that jsonlite
works better along other dimensions? Fewer network issues?
Is this worth it, or should I roll back the changes?
Could be because jsonlite performs additional validity checks, I recall reading some old discussions on that but I'll let others come in in case they have a more up-to-date view on this. Perhaps an idea is to keep the old get_page function and use either a tryCatch so that if it fails on https it tries the jsonlite version, or add a flag to the WDI() function that allows making the switch but keeps the old rjsonio as default if it's preferred.
Right. We are almost certainly paying for the convenience of a robust handling that returns neat data frames. But we did that faster with our customized list processing.
Note to self: try RcppSimdJSON
interesting package. Quick try on windows:
series_url="https://api.worldbank.org/v2/indicator?per_page=25000&format=json"
> system.time(series_dat <- RcppSimdJson::fload(series_url)[[2]])
user system elapsed
0.06 0.09 8.53
> system.time(series_dat <- RcppSimdJson::fload(series_url)[[2]])
user system elapsed
0.09 0.09 6.97
> system.time(series_dat <- jsonlite::fromJSON(series_url)[[2]])
user system elapsed
0.45 0.05 9.25
> system.time(series_dat <- jsonlite::fromJSON(series_url)[[2]])
user system elapsed
0.39 0.12 6.04
Download time is dominating.
I think maybe the C++17 requirement is a bit stringent - eg for a compute heavy package requirements on the compiler make sense, but for a general purpose package the audience may low little about such matters and they might get stuck needing to update. Thoughts?
Be interested to see if you get speed improvements. I'll play around with this package in a few workloads to get a feel.
yep, you're right. I did some tests last week and came to the same conclusion. Let's just use jsonlite
since that is implemented already.
I think it has the broadest base of users at the moment, so the slower processing should buy a bit of better maintainability and usability
Thanks, good points. I will close this now and prep a CRAN release in the next few days.