curl icon indicating copy to clipboard operation
curl copied to clipboard

Use with mclapply

Open asieira opened this issue 7 years ago • 25 comments

I was wondering what the best way is to use the curl package with the mcparallel and mclapply family of functions, which call fork under the hood.

As I understand it, libcurl requires that curl_global_cleanup be called immediately before a fork and that curl_global_init is called immediately after it.

Is there any way to do that or something equivalent with the current implementation of the package?

asieira avatar Nov 24 '17 11:11 asieira

You shouldn't do that. If you want concurrent requests use the multi_run system.

jeroen avatar Nov 24 '17 11:11 jeroen

That would be a good way to do it, yes. But the thing is that I'm using curl to access several different API endpoints in a rather complex processing logic. So I'm splitting up a data stream and calling mclapply to "process" each part. And part of that processing involves calling different external APIs, but most of it is time-consuming processing that does not involve curl.

It would be impractical to refactor to code so as to avoid using curl inside the parallel parts, I'm afraid.

I'm currently testing something like this:

mclapply <- function(X, FUN, ..., mc.preschedule = TRUE, mc.set.seed = TRUE,
                     mc.silent = FALSE, mc.cores = getOption("mc.cores", 2L),
                     mc.cleanup = TRUE, mc.allow.recursive = TRUE) {
  if ("curl" %in% loadedNamespaces()) {
    detach("package:curl", unload = TRUE)
    on.exit(library("curl"))
    FIXEDFUN <- function(...) {
      library("curl")
      on.exit(detach("package:curl", unload = TRUE))
      FUN(...)
    }
  } else
    FIXEDFUN <- FUN

  mclapply(X, FIXEDFUN, ..., mc.preschedule = mc.preschedule, 
           mc.set.seed = mc.set.seed, mc.silent = mc.silent,
           mc.cores = mc.cores, mc.cleanup = mc.cleanup,
           mc.allow.recursive = mc.allow.recursive)
}

asieira avatar Nov 24 '17 12:11 asieira

If I try this on OS X I get an SSL certificate validation error:

> mclapply(c("https://www.google.com", "https://www.facebook.com"), HEAD)
(...)
util.mclapply: bad entry 1/2 [Class 'try-error'  atomic [1:1] Error in curl::curl_fetch_memory(url, handle = handle) : ]
util.mclapply: bad entry 1/2 [  SSL certificate problem: Invalid certificate chain]
util.mclapply: bad entry 1/2 []
util.mclapply: bad entry 1/2 [  ..- attr(*, "condition")=List of 2]
util.mclapply: bad entry 1/2 [  .. ..$ message: chr "SSL certificate problem: Invalid certificate chain"]
util.mclapply: bad entry 1/2 [  .. ..$ call   : language curl::curl_fetch_memory(url, handle = handle)]
util.mclapply: bad entry 1/2 [  .. ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"]

Of course, it works perfectly with lapply so this seems to be a problem that arises from the parallelism:

> lapply(c("https://www.google.com", "https://www.facebook.com"), HEAD)
[[1]]
Response [https://www.google.com.br/?gws_rd=cr&dcr=0&ei=fBcYWqX5K8GYwASw14egDQ]
  Date: 2017-11-24 12:58
  Status: 200
  Content-Type: text/html; charset=ISO-8859-1
<EMPTY BODY>

[[2]]
Response [https://www.facebook.com/unsupportedbrowser]
  Date: 2017-11-24 12:58
  Status: 200
  Content-Type: text/html; charset=UTF-8
<EMPTY BODY>

asieira avatar Nov 24 '17 13:11 asieira

Which version of OSX do you have? It should be fine with the latest OS-X I think.

jeroen avatar Nov 24 '17 13:11 jeroen

Which version of OSX do you have? It should be fine with the latest OS-X I think.

Aren't macOS system libs non-forkable in general? And of course macOS libcurl links to system libs:

❯ otool -L /usr/lib/libcurl.3.dylib
/usr/lib/libcurl.3.dylib:
	/usr/lib/libcurl.4.dylib (compatibility version 7.0.0, current version 9.0.0)
	/System/Library/Frameworks/Security.framework/Versions/A/Security (compatibility version 1.0.0, current version 57740.60.18)
	/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 150.0.0, current version 1349.8.0)
	/System/Library/Frameworks/LDAP.framework/Versions/A/LDAP (compatibility version 1.0.0, current version 2.4.0)
	/System/Library/Frameworks/Kerberos.framework/Versions/A/Kerberos (compatibility version 5.0.0, current version 6.0.0)
	/usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.8)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1238.60.2)

gaborcsardi avatar Nov 24 '17 13:11 gaborcsardi

Apple has switched back from native SSL to LibreSSL in OSX 10.13 so now curl is fork-safe again:

$ otool -L /usr/lib/libcurl.3.dylib
/usr/lib/libcurl.3.dylib:
	/usr/lib/libcurl.4.dylib (compatibility version 7.0.0, current version 9.0.0)
	/usr/lib/libcrypto.35.dylib (compatibility version 36.0.0, current version 36.0.0)
	/usr/lib/libssl.35.dylib (compatibility version 36.0.0, current version 36.0.0)
	/System/Library/Frameworks/LDAP.framework/Versions/A/LDAP (compatibility version 1.0.0, current version 2.4.0)
	/System/Library/Frameworks/Kerberos.framework/Versions/A/Kerberos (compatibility version 5.0.0, current version 6.0.0)
	/usr/lib/libapple_nghttp2.dylib (compatibility version 1.0.0, current version 1.24.0)
	/usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.11)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1252.0.0)

See also in R:

> library(curl)
> curl_version()
$version
[1] "7.54.0"

$ssl_version
[1] "LibreSSL/2.0.20"

$libz_version
[1] "1.2.11"

jeroen avatar Nov 24 '17 13:11 jeroen

image

asieira avatar Nov 24 '17 13:11 asieira

| => otool -L /usr/lib/libcurl.3.dylib
/usr/lib/libcurl.3.dylib:
	/usr/lib/libcurl.4.dylib (compatibility version 7.0.0, current version 9.0.0)
	/usr/lib/libcrypto.35.dylib (compatibility version 36.0.0, current version 36.0.0)
	/usr/lib/libssl.35.dylib (compatibility version 36.0.0, current version 36.0.0)
	/System/Library/Frameworks/LDAP.framework/Versions/A/LDAP (compatibility version 1.0.0, current version 2.4.0)
	/System/Library/Frameworks/Kerberos.framework/Versions/A/Kerberos (compatibility version 5.0.0, current version 6.0.0)
	/usr/lib/libapple_nghttp2.dylib (compatibility version 1.0.0, current version 1.24.0)
	/usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.11)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1252.0.0)

asieira avatar Nov 24 '17 13:11 asieira

@asieira what do you see for curl_version() in R? Can you try reinstalling the curl package from source?

jeroen avatar Nov 24 '17 13:11 jeroen

Apple has switched back from native SSL to LibreSSL in OSX 10.13 so now curl is fork-safe again:

Well, OK, that's good. But of course it will not work on previous OSX versions, and who knows if it will work in the future?

Plus other things might fail as well, because it still links to other system libs.

I think it is just better not to use curl with mclapply....

gaborcsardi avatar Nov 24 '17 13:11 gaborcsardi

I think it is just better not to use curl with mclapply....

Yes I agree 100%, already mentioned that above. But he really wants it, I think it should be possible on High Sierra. However his code will not be portable.

jeroen avatar Nov 24 '17 13:11 jeroen

Also, you cannot just call curl_global_cleanup before forking. What if other packages have active curl handles?

gaborcsardi avatar Nov 24 '17 13:11 gaborcsardi

Oh right I didn't even see that. @asieira you should just use parallel::mclapply() instead if your own version.

jeroen avatar Nov 24 '17 13:11 jeroen

Same problem:

> parallel::mclapply(c("https://www.google.com", "https://www.facebook.com"), HEAD)
[[1]]
[1] "Error in curl::curl_fetch_memory(url, handle = handle) : \n  SSL certificate problem: Invalid certificate chain\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError in curl::curl_fetch_memory(url, handle = handle): SSL certificate problem: Invalid certificate chain>

[[2]]
[1] "Error in curl::curl_fetch_memory(url, handle = handle) : \n  SSL certificate problem: Invalid certificate chain\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError in curl::curl_fetch_memory(url, handle = handle): SSL certificate problem: Invalid certificate chain>

Warning message:
In parallel::mclapply(c("https://www.google.com", "https://www.facebook.com"),  :
  all scheduled cores encountered errors in user code

asieira avatar Nov 24 '17 13:11 asieira

Just to add to the problems: detach(..., unload = TRUE) does not actually unload the shared library:

❯ library(curl)

✔ 65.7 MiB master* ↑
❯ detach("package:curl", unload=TRUE)

✔ 65.7 MiB master* ↑
❯ getLoadedDLLs()
                                                                                            Filename
base                                                                                            base
methods       /Library/Frameworks/R.framework/Versions/3.4/Resources/library/methods/libs/methods.so
grDevices /Library/Frameworks/R.framework/Versions/3.4/Resources/library/grDevices/libs/grDevices.so
graphics    /Library/Frameworks/R.framework/Versions/3.4/Resources/library/graphics/libs/graphics.so
utils             /Library/Frameworks/R.framework/Versions/3.4/Resources/library/utils/libs/utils.so
stats             /Library/Frameworks/R.framework/Versions/3.4/Resources/library/stats/libs/stats.so
memuse                                               /Users/gaborcsardi/r_pkgs/memuse/libs/memuse.so
curl                                                     /Users/gaborcsardi/r_pkgs/curl/libs/curl.so
tools             /Library/Frameworks/R.framework/Versions/3.4/Resources/library/tools/libs/tools.so
          Dynamic.Lookup
base               FALSE
methods            FALSE
grDevices          FALSE
graphics           FALSE
utils              FALSE
stats              FALSE
memuse             FALSE
curl                TRUE
tools              FALSE

gaborcsardi avatar Nov 24 '17 13:11 gaborcsardi

screen shot 2017-11-24 at 2 32 33 pm

jeroen avatar Nov 24 '17 13:11 jeroen

I'm guessing @asieira may have something else loaded in his process that is not fork safe. Are you running a completely vanilla R session in the R terminal?

jeroen avatar Nov 24 '17 13:11 jeroen

Seems like that is the case indeed:

R version 3.4.2 (2017-09-28) -- "Short Summer"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin15.6.0 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

NULL
> library(httr)
> parallel::mclapply(c("https://www.google.com", "https://www.facebook.com"), HEAD)
[[1]]
Response [https://www.google.com.br/?gws_rd=cr&dcr=0&ei=-iAYWpvwNoLwwASViq64Bw]
  Date: 2017-11-24 13:39
  Status: 200
  Content-Type: text/html; charset=ISO-8859-1
<EMPTY BODY>

[[2]]
Response [https://www.facebook.com/unsupportedbrowser]
  Date: 2017-11-24 13:39
  Status: 200
  Content-Type: text/html; charset=UTF-8
<EMPTY BODY>

Funnily enough, when I used R.app to do the same thing it just hung. This only worked for me on Terminal. :(

asieira avatar Nov 24 '17 13:11 asieira

Can anyone else confirm if you have the same SSL issue I had with parallel::mclapply when running inside RStudio? Maybe that's what's making a difference.

asieira avatar Nov 24 '17 13:11 asieira

Works fine for me in rstudio and R.app. What else is in your sessionInfo()?

jeroen avatar Nov 24 '17 13:11 jeroen

Lots of things:

> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.1

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
 [1] compiler  tools     stats     graphics  grDevices datasets  parallel  utils     methods  
[10] base     

other attached packages:
 [1] xml2_1.1.1          punycode_0.2.5      dplyr_0.5.0         base64enc_0.1-3    
 [5] RJSONIO_1.3-0       digest_0.6.12       bitops_1.0-6        RApiSerialize_0.1.0
 [9] bit64_0.9-7         bit_1.1-12          urltools_1.6.0.9000 curl_3.0           
[13] SnakeCharmR_1.0.6   igraph_1.1.2        R.utils_2.5.0       R.oo_1.21.0        
[17] R.methodsS3_1.7.1   Rcpp_0.12.13        stringdist_0.9.4.6  gdata_2.18.0       
[21] pryr_0.1.3          testthat_1.0.2      httr_1.3.1          stringr_1.2.0      
[25] hash_2.2.6          data.table_1.9.6    futile.logger_1.4.3 jsonlite_1.5       

loaded via a namespace (and not attached):
 [1] futile.options_1.0.0 tibble_1.3.4         rlang_0.1.2          pkgconfig_2.0.1     
 [5] DBI_0.7              rstudioapi_0.7       yaml_2.1.14          gtools_3.5.0        
 [9] triebeard_0.3.0      R6_2.2.2             lambda.r_1.2         magrittr_1.5        
[13] codetools_0.2-15     assertthat_0.2.0     stringi_1.1.5        chron_2.3-51        
[17] crayon_1.3.4     

asieira avatar Nov 24 '17 14:11 asieira

So the problem is likely not curl but one of those things that is not fork safe.

jeroen avatar Nov 24 '17 14:11 jeroen

Reduced the scope even further with a clean RStudio session... I don't get the SSL error anymore, but don't get a correct response either:


R version 3.4.2 (2017-09-28) -- "Short Summer"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin15.6.0 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

NULL
> library(httr)
> parallel::mclapply(c("https://www.google.com", "https://www.facebook.com"), HEAD)
[[1]]
NULL

> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.1

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices datasets  parallel  utils     methods   base     

other attached packages:
[1] httr_1.3.1          futile.logger_1.4.3

loaded via a namespace (and not attached):
[1] compiler_3.4.2       R6_2.2.2             tools_3.4.2          lambda.r_1.2        
[5] yaml_2.1.14          futile.options_1.0.0

asieira avatar Nov 24 '17 14:11 asieira

This seems relevant: https://curl.haxx.se/mail/lib-2013-03/0226.html

asieira avatar Nov 27 '17 10:11 asieira

Hi @jeroen and @asieira I was wondering did you manage to find a solution for this? Many thanks!

shajoezhu avatar Jul 11 '21 00:07 shajoezhu