curl
curl copied to clipboard
Use with mclapply
I was wondering what the best way is to use the curl package with the mcparallel
and mclapply
family of functions, which call fork
under the hood.
As I understand it, libcurl requires that curl_global_cleanup
be called immediately before a fork and that curl_global_init
is called immediately after it.
Is there any way to do that or something equivalent with the current implementation of the package?
You shouldn't do that. If you want concurrent requests use the multi_run
system.
That would be a good way to do it, yes. But the thing is that I'm using curl
to access several different API endpoints in a rather complex processing logic. So I'm splitting up a data stream and calling mclapply to "process" each part. And part of that processing involves calling different external APIs, but most of it is time-consuming processing that does not involve curl
.
It would be impractical to refactor to code so as to avoid using curl inside the parallel parts, I'm afraid.
I'm currently testing something like this:
mclapply <- function(X, FUN, ..., mc.preschedule = TRUE, mc.set.seed = TRUE,
mc.silent = FALSE, mc.cores = getOption("mc.cores", 2L),
mc.cleanup = TRUE, mc.allow.recursive = TRUE) {
if ("curl" %in% loadedNamespaces()) {
detach("package:curl", unload = TRUE)
on.exit(library("curl"))
FIXEDFUN <- function(...) {
library("curl")
on.exit(detach("package:curl", unload = TRUE))
FUN(...)
}
} else
FIXEDFUN <- FUN
mclapply(X, FIXEDFUN, ..., mc.preschedule = mc.preschedule,
mc.set.seed = mc.set.seed, mc.silent = mc.silent,
mc.cores = mc.cores, mc.cleanup = mc.cleanup,
mc.allow.recursive = mc.allow.recursive)
}
If I try this on OS X I get an SSL certificate validation error:
> mclapply(c("https://www.google.com", "https://www.facebook.com"), HEAD)
(...)
util.mclapply: bad entry 1/2 [Class 'try-error' atomic [1:1] Error in curl::curl_fetch_memory(url, handle = handle) : ]
util.mclapply: bad entry 1/2 [ SSL certificate problem: Invalid certificate chain]
util.mclapply: bad entry 1/2 []
util.mclapply: bad entry 1/2 [ ..- attr(*, "condition")=List of 2]
util.mclapply: bad entry 1/2 [ .. ..$ message: chr "SSL certificate problem: Invalid certificate chain"]
util.mclapply: bad entry 1/2 [ .. ..$ call : language curl::curl_fetch_memory(url, handle = handle)]
util.mclapply: bad entry 1/2 [ .. ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"]
Of course, it works perfectly with lapply
so this seems to be a problem that arises from the parallelism:
> lapply(c("https://www.google.com", "https://www.facebook.com"), HEAD)
[[1]]
Response [https://www.google.com.br/?gws_rd=cr&dcr=0&ei=fBcYWqX5K8GYwASw14egDQ]
Date: 2017-11-24 12:58
Status: 200
Content-Type: text/html; charset=ISO-8859-1
<EMPTY BODY>
[[2]]
Response [https://www.facebook.com/unsupportedbrowser]
Date: 2017-11-24 12:58
Status: 200
Content-Type: text/html; charset=UTF-8
<EMPTY BODY>
Which version of OSX do you have? It should be fine with the latest OS-X I think.
Which version of OSX do you have? It should be fine with the latest OS-X I think.
Aren't macOS system libs non-forkable in general? And of course macOS libcurl
links to system libs:
❯ otool -L /usr/lib/libcurl.3.dylib
/usr/lib/libcurl.3.dylib:
/usr/lib/libcurl.4.dylib (compatibility version 7.0.0, current version 9.0.0)
/System/Library/Frameworks/Security.framework/Versions/A/Security (compatibility version 1.0.0, current version 57740.60.18)
/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 150.0.0, current version 1349.8.0)
/System/Library/Frameworks/LDAP.framework/Versions/A/LDAP (compatibility version 1.0.0, current version 2.4.0)
/System/Library/Frameworks/Kerberos.framework/Versions/A/Kerberos (compatibility version 5.0.0, current version 6.0.0)
/usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.8)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1238.60.2)
Apple has switched back from native SSL to LibreSSL in OSX 10.13 so now curl is fork-safe again:
$ otool -L /usr/lib/libcurl.3.dylib
/usr/lib/libcurl.3.dylib:
/usr/lib/libcurl.4.dylib (compatibility version 7.0.0, current version 9.0.0)
/usr/lib/libcrypto.35.dylib (compatibility version 36.0.0, current version 36.0.0)
/usr/lib/libssl.35.dylib (compatibility version 36.0.0, current version 36.0.0)
/System/Library/Frameworks/LDAP.framework/Versions/A/LDAP (compatibility version 1.0.0, current version 2.4.0)
/System/Library/Frameworks/Kerberos.framework/Versions/A/Kerberos (compatibility version 5.0.0, current version 6.0.0)
/usr/lib/libapple_nghttp2.dylib (compatibility version 1.0.0, current version 1.24.0)
/usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.11)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1252.0.0)
See also in R:
> library(curl)
> curl_version()
$version
[1] "7.54.0"
$ssl_version
[1] "LibreSSL/2.0.20"
$libz_version
[1] "1.2.11"
| => otool -L /usr/lib/libcurl.3.dylib
/usr/lib/libcurl.3.dylib:
/usr/lib/libcurl.4.dylib (compatibility version 7.0.0, current version 9.0.0)
/usr/lib/libcrypto.35.dylib (compatibility version 36.0.0, current version 36.0.0)
/usr/lib/libssl.35.dylib (compatibility version 36.0.0, current version 36.0.0)
/System/Library/Frameworks/LDAP.framework/Versions/A/LDAP (compatibility version 1.0.0, current version 2.4.0)
/System/Library/Frameworks/Kerberos.framework/Versions/A/Kerberos (compatibility version 5.0.0, current version 6.0.0)
/usr/lib/libapple_nghttp2.dylib (compatibility version 1.0.0, current version 1.24.0)
/usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.11)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1252.0.0)
@asieira what do you see for curl_version()
in R? Can you try reinstalling the curl package from source?
Apple has switched back from native SSL to LibreSSL in OSX 10.13 so now curl is fork-safe again:
Well, OK, that's good. But of course it will not work on previous OSX versions, and who knows if it will work in the future?
Plus other things might fail as well, because it still links to other system libs.
I think it is just better not to use curl
with mclapply
....
I think it is just better not to use
curl
withmclapply
....
Yes I agree 100%, already mentioned that above. But he really wants it, I think it should be possible on High Sierra. However his code will not be portable.
Also, you cannot just call curl_global_cleanup
before forking. What if other packages have active curl
handles?
Oh right I didn't even see that. @asieira you should just use parallel::mclapply()
instead if your own version.
Same problem:
> parallel::mclapply(c("https://www.google.com", "https://www.facebook.com"), HEAD)
[[1]]
[1] "Error in curl::curl_fetch_memory(url, handle = handle) : \n SSL certificate problem: Invalid certificate chain\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError in curl::curl_fetch_memory(url, handle = handle): SSL certificate problem: Invalid certificate chain>
[[2]]
[1] "Error in curl::curl_fetch_memory(url, handle = handle) : \n SSL certificate problem: Invalid certificate chain\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError in curl::curl_fetch_memory(url, handle = handle): SSL certificate problem: Invalid certificate chain>
Warning message:
In parallel::mclapply(c("https://www.google.com", "https://www.facebook.com"), :
all scheduled cores encountered errors in user code
Just to add to the problems: detach(..., unload = TRUE)
does not actually unload the shared library:
❯ library(curl)
✔ 65.7 MiB master* ↑
❯ detach("package:curl", unload=TRUE)
✔ 65.7 MiB master* ↑
❯ getLoadedDLLs()
Filename
base base
methods /Library/Frameworks/R.framework/Versions/3.4/Resources/library/methods/libs/methods.so
grDevices /Library/Frameworks/R.framework/Versions/3.4/Resources/library/grDevices/libs/grDevices.so
graphics /Library/Frameworks/R.framework/Versions/3.4/Resources/library/graphics/libs/graphics.so
utils /Library/Frameworks/R.framework/Versions/3.4/Resources/library/utils/libs/utils.so
stats /Library/Frameworks/R.framework/Versions/3.4/Resources/library/stats/libs/stats.so
memuse /Users/gaborcsardi/r_pkgs/memuse/libs/memuse.so
curl /Users/gaborcsardi/r_pkgs/curl/libs/curl.so
tools /Library/Frameworks/R.framework/Versions/3.4/Resources/library/tools/libs/tools.so
Dynamic.Lookup
base FALSE
methods FALSE
grDevices FALSE
graphics FALSE
utils FALSE
stats FALSE
memuse FALSE
curl TRUE
tools FALSE
data:image/s3,"s3://crabby-images/bc2d5/bc2d53e6d2c01c08157076085b8d05beb2197b85" alt="screen shot 2017-11-24 at 2 32 33 pm"
I'm guessing @asieira may have something else loaded in his process that is not fork safe. Are you running a completely vanilla R session in the R terminal?
Seems like that is the case indeed:
R version 3.4.2 (2017-09-28) -- "Short Summer"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin15.6.0 (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
NULL
> library(httr)
> parallel::mclapply(c("https://www.google.com", "https://www.facebook.com"), HEAD)
[[1]]
Response [https://www.google.com.br/?gws_rd=cr&dcr=0&ei=-iAYWpvwNoLwwASViq64Bw]
Date: 2017-11-24 13:39
Status: 200
Content-Type: text/html; charset=ISO-8859-1
<EMPTY BODY>
[[2]]
Response [https://www.facebook.com/unsupportedbrowser]
Date: 2017-11-24 13:39
Status: 200
Content-Type: text/html; charset=UTF-8
<EMPTY BODY>
Funnily enough, when I used R.app to do the same thing it just hung. This only worked for me on Terminal. :(
Can anyone else confirm if you have the same SSL issue I had with parallel::mclapply when running inside RStudio? Maybe that's what's making a difference.
Works fine for me in rstudio and R.app. What else is in your sessionInfo()?
Lots of things:
> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.1
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] compiler tools stats graphics grDevices datasets parallel utils methods
[10] base
other attached packages:
[1] xml2_1.1.1 punycode_0.2.5 dplyr_0.5.0 base64enc_0.1-3
[5] RJSONIO_1.3-0 digest_0.6.12 bitops_1.0-6 RApiSerialize_0.1.0
[9] bit64_0.9-7 bit_1.1-12 urltools_1.6.0.9000 curl_3.0
[13] SnakeCharmR_1.0.6 igraph_1.1.2 R.utils_2.5.0 R.oo_1.21.0
[17] R.methodsS3_1.7.1 Rcpp_0.12.13 stringdist_0.9.4.6 gdata_2.18.0
[21] pryr_0.1.3 testthat_1.0.2 httr_1.3.1 stringr_1.2.0
[25] hash_2.2.6 data.table_1.9.6 futile.logger_1.4.3 jsonlite_1.5
loaded via a namespace (and not attached):
[1] futile.options_1.0.0 tibble_1.3.4 rlang_0.1.2 pkgconfig_2.0.1
[5] DBI_0.7 rstudioapi_0.7 yaml_2.1.14 gtools_3.5.0
[9] triebeard_0.3.0 R6_2.2.2 lambda.r_1.2 magrittr_1.5
[13] codetools_0.2-15 assertthat_0.2.0 stringi_1.1.5 chron_2.3-51
[17] crayon_1.3.4
So the problem is likely not curl but one of those things that is not fork safe.
Reduced the scope even further with a clean RStudio session... I don't get the SSL error anymore, but don't get a correct response either:
R version 3.4.2 (2017-09-28) -- "Short Summer"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin15.6.0 (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
NULL
> library(httr)
> parallel::mclapply(c("https://www.google.com", "https://www.facebook.com"), HEAD)
[[1]]
NULL
> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.1
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices datasets parallel utils methods base
other attached packages:
[1] httr_1.3.1 futile.logger_1.4.3
loaded via a namespace (and not attached):
[1] compiler_3.4.2 R6_2.2.2 tools_3.4.2 lambda.r_1.2
[5] yaml_2.1.14 futile.options_1.0.0
This seems relevant: https://curl.haxx.se/mail/lib-2013-03/0226.html
Hi @jeroen and @asieira I was wondering did you manage to find a solution for this? Many thanks!