bioRad
bioRad copied to clipboard
Use httr instead of (r)curl
From the rOpenSci onboarding guide:
For http requests we strongly recommend using httr over RCurl.
We need to tackle RCurl vs curl for #131 anyway, so might be good to use httr instead.
Current functionality (0.3.0.9170) from INBO network:
library(bioRad)
start_time <- Sys.time()
download_vpfiles(
"2016-10-01",
"2016-11-30",
c("be"),
c("jab", "wid"),
directory = "my_data"
)
end_time <- Sys.time()
difftime(end_time, start_time, units = "secs")
Console:
> library(bioRad)
Welcome to bioRad version 0.3.0.9170
Docker daemon running, Docker functionality enabled.
> start_time <- Sys.time()
> download_vpfiles(
+ "2016-10-01",
+ "2016-11-30",
+ c("be"),
+ c("jab", "wid"),
+ directory = "my_data"
+ )
[1] "Downloading file bejab201610.zip"
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 51.8M 0 68213 33 51.8M 33 6662k 0 0 5494k 0 0:00:09100 51.8M 100 51.8M 0 0 9277k 0 0:00:05 0:00:05 --:--:-- 10.0M
[1] "Downloading file bewid201610.zip"
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 22 59.0M 22 1448k 0 0 3967k 100 59.0M 100 59.0M 0 0 2769k 0 0:00:21 0:00:21 --:--:-- 2490k
[1] "no data available at URL https://lw-enram.s3-eu-west-1.amazonaws.com/be/jab/2016/bejab201611.zip"
[1] "no data available at URL https://lw-enram.s3-eu-west-1.amazonaws.com/be/wid/2016/bewid201611.zip"
> end_time <- Sys.time()
> difftime(end_time, start_time, units = "secs")
Time difference of 52.90403 secs
The dependency of Rcurl
can be removed when we replace the getBinaryURL
function which is used to check the existence of an URL in the helper function check_url_existence
. We could replace this by the httr
function http_error
which provides the same functionality:
library(httr)
url <- "https://lw-enram.s3-eu-west-1.amazonaws.com/be/jab/2019/bejab201910.zip"
http_error(url)
So, an easy fix actually to comply to ropensci
recommendation (and one internal function less).
Still, with #131 in mind, we could also get rid of the url existence check (http_error
would be the only httr
function) and for example, put the curl_download
function into a tryCatch
environment, which makes us only rely on curl
:
tryCatch(z <- curl_download(url, "~/Documents/temp/test.zip",
quiet = FALSE),
error = function(e) {
print(paste("no data available at URL", url))
}
)
but, unless we can improve the error inside the handle, we would override potential other errors (e.g. path not writeable) with our custom message making it less useful for the package user. I like it more to have the check seperate, but this would be something I would have to check how to do with curl-only.
Notice that httr
also relies on curl.
@stijnvanhoey I’m working on this, I can show you what I have already. See e.g. https://github.com/r-lib/httr/issues/553
To be clear: there is no need to replace the current curl_download
function (from curl
package) with a function from httr
, as this would not remove a dependency (httr
relies on curl
)...
The pre-check of an existing url (http_error
) already exists in httr
package, so when ok to keep httr
and curl
in biorad, the fix is easy. When the purpose is to have less dependencies, it is better to only depend on curl
instead of on httr
.
Hi @sckott, some context: this is a package (bioRad) for the analysis biological targets in weather radar data. One function allows to bulk download files from a data repository (an Amazon S3 bucket) and currently relies on RCurl
and curl
. To reduce the number of dependencies, we can simplify that to just using curl
, trying to keep functionality to fail nicely if a certain data file is not available in the bucket.
But we are also planning to submit the package for review to rOpenSci at some point, which guidelines strongly recommend the use of httr
of RCurl
. Would you advise us to just use curl
(just one dependency) or httr
(on top of curl
, so 2 dependencies) here?
of the two RCurl
and curl
, I agree to use curl
only
If you don't need functionality of httr
, then sticking with curl
is just fine.
We do maintain an alternative to httr
- https://github.com/ropensci/crul/ - which we'll add to the guidelines as an alternative to httr
i'd say stick with curl
if it meets your needs
Rcurl was removed in favour of curl
in #176 in 2018 and was included in bioRad 0.4.0. httr
is not used.