bioRad icon indicating copy to clipboard operation
bioRad copied to clipboard

Use httr instead of (r)curl

Open peterdesmet opened this issue 6 years ago • 6 comments

From the rOpenSci onboarding guide:

For http requests we strongly recommend using httr over RCurl.

We need to tackle RCurl vs curl for #131 anyway, so might be good to use httr instead.

peterdesmet avatar Sep 26 '18 13:09 peterdesmet

Current functionality (0.3.0.9170) from INBO network:

library(bioRad)
start_time <- Sys.time()
download_vpfiles(
  "2016-10-01",
  "2016-11-30",
  c("be"),
  c("jab", "wid"),
  directory = "my_data"
)
end_time <- Sys.time()
difftime(end_time, start_time, units = "secs")

Console:

> library(bioRad)
Welcome to bioRad version 0.3.0.9170
Docker daemon running, Docker functionality enabled.
> start_time <- Sys.time()
> download_vpfiles(
+   "2016-10-01",
+   "2016-11-30",
+   c("be"),
+   c("jab", "wid"),
+   directory = "my_data"
+ )
[1] "Downloading file bejab201610.zip"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0 51.8M    0 68213  33 51.8M   33 6662k    0     0  5494k      0  0:00:09100 51.8M  100 51.8M    0     0  9277k      0  0:00:05  0:00:05 --:--:-- 10.0M
[1] "Downloading file bewid201610.zip"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0      22 59.0M   22 1448k    0     0  3967k    100 59.0M  100 59.0M    0     0  2769k      0  0:00:21  0:00:21 --:--:-- 2490k
[1] "no data available at URL https://lw-enram.s3-eu-west-1.amazonaws.com/be/jab/2016/bejab201611.zip"
[1] "no data available at URL https://lw-enram.s3-eu-west-1.amazonaws.com/be/wid/2016/bewid201611.zip"
> end_time <- Sys.time()
> difftime(end_time, start_time, units = "secs")
Time difference of 52.90403 secs

peterdesmet avatar Dec 04 '18 13:12 peterdesmet

The dependency of Rcurl can be removed when we replace the getBinaryURL function which is used to check the existence of an URL in the helper function check_url_existence. We could replace this by the httr function http_error which provides the same functionality:

library(httr)
url <- "https://lw-enram.s3-eu-west-1.amazonaws.com/be/jab/2019/bejab201910.zip"
http_error(url)

So, an easy fix actually to comply to ropensci recommendation (and one internal function less).

Still, with #131 in mind, we could also get rid of the url existence check (http_error would be the only httr function) and for example, put the curl_download function into a tryCatch environment, which makes us only rely on curl:

  tryCatch(z <- curl_download(url, "~/Documents/temp/test.zip",
                              quiet = FALSE),
           error = function(e) {
             print(paste("no data available at URL", url))
           }
  )

but, unless we can improve the error inside the handle, we would override potential other errors (e.g. path not writeable) with our custom message making it less useful for the package user. I like it more to have the check seperate, but this would be something I would have to check how to do with curl-only.

Notice that httr also relies on curl.

stijnvanhoey avatar Dec 04 '18 22:12 stijnvanhoey

@stijnvanhoey I’m working on this, I can show you what I have already. See e.g. https://github.com/r-lib/httr/issues/553

peterdesmet avatar Dec 04 '18 22:12 peterdesmet

To be clear: there is no need to replace the current curl_download function (from curl package) with a function from httr, as this would not remove a dependency (httr relies on curl)...

The pre-check of an existing url (http_error) already exists in httr package, so when ok to keep httr and curl in biorad, the fix is easy. When the purpose is to have less dependencies, it is better to only depend on curl instead of on httr.

stijnvanhoey avatar Dec 04 '18 23:12 stijnvanhoey

Hi @sckott, some context: this is a package (bioRad) for the analysis biological targets in weather radar data. One function allows to bulk download files from a data repository (an Amazon S3 bucket) and currently relies on RCurl and curl. To reduce the number of dependencies, we can simplify that to just using curl, trying to keep functionality to fail nicely if a certain data file is not available in the bucket.

But we are also planning to submit the package for review to rOpenSci at some point, which guidelines strongly recommend the use of httr of RCurl. Would you advise us to just use curl (just one dependency) or httr (on top of curl, so 2 dependencies) here?

peterdesmet avatar Dec 05 '18 07:12 peterdesmet

of the two RCurl and curl, I agree to use curl only

If you don't need functionality of httr, then sticking with curl is just fine.

We do maintain an alternative to httr - https://github.com/ropensci/crul/ - which we'll add to the guidelines as an alternative to httr

i'd say stick with curl if it meets your needs

sckott avatar Dec 05 '18 15:12 sckott

Rcurl was removed in favour of curl in #176 in 2018 and was included in bioRad 0.4.0. httr is not used.

peterdesmet avatar Mar 10 '23 16:03 peterdesmet