curl icon indicating copy to clipboard operation
curl copied to clipboard

Multi reset?

Open JKcme opened this issue 7 years ago • 6 comments

Hi Jeroen - I'm enjoying the curl package a lot -thanks! I'm a total noob to curl, so of course I'm trying to do the more advanced stuff first...so here goes my code:

zz <- file("output.txt", open = "wb") pool <- new_pool(total_con = 125, host_con = 125, multiplex = F) for(i in 1:5000){ h1 <- new_handle(url=get_These_trade_dates[i]) multi_add(h1, done = success, fail = failure, data = zz, pool = pool) }`

multi_run(pool = pool) close(zz) data <- as.data.table(read.table(file="output.txt",sep = ",",header=T, colClasses = "character")) unlink("output.txt")

So sometimes this returns the complete 5000 trade dates and all the correct data, but when I try to run this exact code again it doesn't always return all the dates. Sometimes when I completely exit RStudio and come back in it will work, but not always. Is there some sort of reset that needs to be done? If so, how would I do that? (The host is a cloud operator that supports up to 500 connections, btw.)

Thanks in advance for your advice! - John

R version 3.4.3 (2017-11-30) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200) Matrix products: default

locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] data.table_1.10.4-3 curl_3.1

loaded via a namespace (and not attached): [1] compiler_3.4.3 tools_3.4.3

> curl_version() $version [1] "7.56.1"

$ssl_version [1] "(OpenSSL/1.1.0f) WinSSL"

$libz_version [1] "1.2.8"

$libssh_version [1] "libssh2/1.8.0"

$libidn_version [1] NA

$host [1] "x86_64-w64-mingw32"

$protocols [1] "dict" "file" "ftp" "ftps" "gopher" "http" "https" "imap" "imaps" "ldap" "ldaps" "pop3"
[13] "pop3s" "rtsp" "scp" "sftp" "smtp" "smtps" "telnet" "tftp"

$ipv6 [1] TRUE

$http2 [1] FALSE

$idn [1] TRUE

JKcme avatar Feb 01 '18 15:02 JKcme

Update:

This seems to run ok in repeat trials (changes in ** **):

zz <- file("output.txt", open = "wb") pool <- new_pool(total_con = 125, host_con = 125, multiplex = F)

for(i in 1: **1000** ){

h1 <- new_handle(url=get_These_trade_dates[i]) multi_add(h1, done = NULL, fail = NULL, data = zz, pool = pool) }

multi_run(pool = pool, **poll=1000**) close(zz) data <- as.data.table(read.table(file="output.txt",sep = ",",header=T, colClasses = "character")) unlink("output.txt")

JKcme avatar Feb 01 '18 18:02 JKcme

What are you trying to do exactly? You are running 5000 concurrent requests and they all write to the same file? So all the output gets mixed up?

jeroen avatar Feb 01 '18 19:02 jeroen

The output isn't mixed up as I key the data.table later. Each trade date returns exactly the same (71) columns. And yes, I'm trying to write the data to only one file. I'm essentially having to pull each individual trade date, write the data to a file, make it a data.table, and then key it appropriately.

JKcme avatar Feb 01 '18 19:02 JKcme

OK but the way you do it right now, all the requests are writing to the same file simultaneously so you can get corruptions. It's better to buffer everything in memory and write it to a file later on.

jeroen avatar Feb 01 '18 20:02 jeroen

Ah OK, I thought maybe that's what was doing it - something getting corrupted along the way. I'll try to buffer and go from there. I think I had originally used readLines() somewhere in all this; I'll try something like that again. Thank you!

JKcme avatar Feb 01 '18 20:02 JKcme

In my query I can get the raw bytes from the content vector....just like you do in your examples. Problem is I have embedded nulls and can't get anything to work to remove them. I'm on the right track because I could get this:

> cat(rawToChar(results[[1000]]$content)) "Could not initiate UNO connection"

And > cat(rawToChar(results[[1000]]$headers)) HTTP/1.1 401 Unauthorized Date: Thu, 01 Feb 2018 21:23:30 GMT Content-Type: text/html;charset=utf-8 Content-Length: 951 Connection: keep-alive Server: Apache-Coyote/1.1 WWW-Authenticate: Basic realm=xxx_api Content-Language: en Strict-Transport-Security: max-age=31536000

HTTP/1.1 400 Bad Request Date: Thu, 01 Feb 2018 21:23:57 GMT Content-Type: application/json Transfer-Encoding: chunked Connection: keep-alive Server: Apache-Coyote/1.1 Strict-Transport-Security: max-age=31536000

If I can remove the embedded nulls from the other iterations I wouldn't have to write to memory.

JKcme avatar Feb 01 '18 23:02 JKcme