curl
curl copied to clipboard
Multi reset?
Hi Jeroen - I'm enjoying the curl package a lot -thanks! I'm a total noob to curl, so of course I'm trying to do the more advanced stuff first...so here goes my code:
zz <- file("output.txt", open = "wb")
pool <- new_pool(total_con = 125, host_con = 125, multiplex = F)
for(i in 1:5000){
h1 <- new_handle(url=get_These_trade_dates[i])
multi_add(h1, done = success, fail = failure, data = zz, pool = pool)
}`
multi_run(pool = pool)
close(zz)
data <- as.data.table(read.table(file="output.txt",sep = ",",header=T, colClasses = "character"))
unlink("output.txt")
So sometimes this returns the complete 5000 trade dates and all the correct data, but when I try to run this exact code again it doesn't always return all the dates. Sometimes when I completely exit RStudio and come back in it will work, but not always. Is there some sort of reset that needs to be done? If so, how would I do that? (The host is a cloud operator that supports up to 500 connections, btw.)
Thanks in advance for your advice! - John
R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.10.4-3 curl_3.1
loaded via a namespace (and not attached):
[1] compiler_3.4.3 tools_3.4.3
> curl_version()
$version
[1] "7.56.1"
$ssl_version
[1] "(OpenSSL/1.1.0f) WinSSL"
$libz_version
[1] "1.2.8"
$libssh_version
[1] "libssh2/1.8.0"
$libidn_version
[1] NA
$host
[1] "x86_64-w64-mingw32"
$protocols
[1] "dict" "file" "ftp" "ftps" "gopher" "http" "https" "imap" "imaps" "ldap" "ldaps" "pop3"
[13] "pop3s" "rtsp" "scp" "sftp" "smtp" "smtps" "telnet" "tftp"
$ipv6
[1] TRUE
$http2
[1] FALSE
$idn
[1] TRUE
Update:
This seems to run ok in repeat trials (changes in ** **):
zz <- file("output.txt", open = "wb")
pool <- new_pool(total_con = 125, host_con = 125, multiplex = F)
for(i in 1: **1000** ){
h1 <- new_handle(url=get_These_trade_dates[i])
multi_add(h1, done = NULL, fail = NULL, data = zz, pool = pool)
}
multi_run(pool = pool, **poll=1000**)
close(zz)
data <- as.data.table(read.table(file="output.txt",sep = ",",header=T, colClasses = "character"))
unlink("output.txt")
What are you trying to do exactly? You are running 5000 concurrent requests and they all write to the same file? So all the output gets mixed up?
The output isn't mixed up as I key the data.table later. Each trade date returns exactly the same (71) columns. And yes, I'm trying to write the data to only one file. I'm essentially having to pull each individual trade date, write the data to a file, make it a data.table, and then key it appropriately.
OK but the way you do it right now, all the requests are writing to the same file simultaneously so you can get corruptions. It's better to buffer everything in memory and write it to a file later on.
Ah OK, I thought maybe that's what was doing it - something getting corrupted along the way. I'll try to buffer and go from there. I think I had originally used readLines() somewhere in all this; I'll try something like that again. Thank you!
In my query I can get the raw bytes from the content vector....just like you do in your examples. Problem is I have embedded nulls and can't get anything to work to remove them. I'm on the right track because I could get this:
> cat(rawToChar(results[[1000]]$content))
"Could not initiate UNO connection"
And
> cat(rawToChar(results[[1000]]$headers))
HTTP/1.1 401 Unauthorized
Date: Thu, 01 Feb 2018 21:23:30 GMT
Content-Type: text/html;charset=utf-8
Content-Length: 951
Connection: keep-alive
Server: Apache-Coyote/1.1
WWW-Authenticate: Basic realm=xxx_api
Content-Language: en
Strict-Transport-Security: max-age=31536000
HTTP/1.1 400 Bad Request
Date: Thu, 01 Feb 2018 21:23:57 GMT
Content-Type: application/json
Transfer-Encoding: chunked
Connection: keep-alive
Server: Apache-Coyote/1.1
Strict-Transport-Security: max-age=31536000
If I can remove the embedded nulls from the other iterations I wouldn't have to write to memory.