dataRetrieval icon indicating copy to clipboard operation
dataRetrieval copied to clipboard

Timeout error when attempting to retrieve data

Open danfished opened this issue 8 months ago • 5 comments

I get a curl timeout error when attempting to download data using the example code from the USGS page, example below:

library("dataRetrieval")

siteNo <- "01540500" pCode <- "00060" start.date <- "2022-08-01" end.date <- "2022-09-30"

danville <- readNWISuv(siteNumbers = siteNo, parameterCd = pCode, startDate = start.date, endDate = end.date)

Error message:

"Error in curl::curl_fetch_memory(url, handle = handle): Timeout was reached: [nwis.waterservices.usgs.gov] Failed to connect to nwis.waterservices.usgs.gov port 443 after 7519 ms: Timed out Request failed [ERROR]. Retrying in 1 seconds..."

It usually will try two more times before final timeout. I have tried to increase timeout limit, but it always times out around 7000 ms.

I am attempting to use dataRetrieval on a state network/state issued computer. I have reached out to our IT department, but they didn't really have any suggestions other than to update R and RStudio, but that opens up another issue myself and others have found with updates- without the proper combination of the two, our network won't download packages properly either. We also did attempt to update the .Renviron file with the following but it didn't seem to change anything:

http_proxy=http://proxy.state.gov/ http_proxy_user=user:pw

http_proxy=http://waterservices.usgs.gov/ http_proxy_user=user:pw

I have spoke with people at other offices who are having the same issue, clearly appears to be a problem with our network/firewall/proxy settings, but if someone could provide any insight for my simple brain to pass on to IT it would be greatly appreciated.

Session info:

R version 4.1.2 (2021-11-01) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252
system code page: 65001

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] dataRetrieval_2.7.14

loaded via a namespace (and not attached): [1] Rcpp_1.0.10 rstudioapi_0.15.0 magrittr_2.0.3 units_0.8-1 tidyselect_1.2.0 R6_2.5.1
[7] rlang_1.1.0 fansi_1.0.4 httr_1.4.6 dplyr_1.1.2 tools_4.1.2 grid_4.1.2
[13] KernSmooth_2.23-20 utf8_1.2.3 cli_3.6.1 e1071_1.7-13 DBI_1.1.3 class_7.3-21
[19] tibble_3.2.1 lifecycle_1.0.3 sf_1.0-12 vctrs_0.6.1 curl_5.0.0 glue_1.6.2
[25] proxy_0.4-27 compiler_4.1.2 pillar_1.9.0 generics_0.1.3 classInt_0.4-9 pkgconfig_2.0.3

danfished avatar Dec 19 '23 20:12 danfished

Hello @danfished and thanks for the question. Doing some digging, but just an FYI that many of our resident experts are away for the holidays so it will take a bit longer than normal.

lstanish-usgs avatar Dec 22 '23 14:12 lstanish-usgs

No guarantees that this will solve the problem, but the URL for waterservices should be https://waterservices.usgs.gov/

lstanish-usgs avatar Dec 22 '23 17:12 lstanish-usgs

(Copying this from #270 ):

You'll have to setup the R client to use the proxy. It should hopefully be somewhat straight forward. I don't have a proxy to test this against, but...

This should hopefully get you your proxy info. If not, the config script should.

curl::ie_proxy_info()

Then, you need to setup httr to use that proxy. There is a command use_proxy that needs to be fed into set_config.

library(httr)
set_config(use_proxy(url="abc.com",port=8080, username="username", password="password"))

Note: Username and pass may be optional.

That, hopefully should fix your issue. But keep in mind, only for the life of your R session. You'd need to re-run when you restart R or put it into an .Renviron file so it runs every time on startup.

Let me know if that doesn't solve the problems

ldecicco-USGS avatar Dec 27 '23 16:12 ldecicco-USGS

@ldecicco-USGS Thanks for your reply. I was able to verify I'm using the correct proxy (still unsure 100% on 8080 being the correct port):

curl::ie_proxy_info() $AutoDetect [1] FALSE

$AutoConfigUrl [1] "http://o365proxy.pa.gov"

$Proxy NULL

$ProxyBypass NULL

But unfortunately I'm still get the similar timeout error code after using set_config both and without username/pw.

danfished avatar Dec 27 '23 16:12 danfished

So what exactly (OK, not exactly... don't paste in a password) do you have written in your .Renviorn? Is it:

library(httr)
set_config(use_proxy(url="o365proxy.pa.gov",
                     port=8080, 
                     username="user:pw"))

(above it sounded like you might be putting the waterservices URL into the use_proxy function, that would not be correct).

Assuming that looks good above, do you know if you use a PAC file? What do you get when you run: ie_get_proxy_for_url("https://waterservices.usgs.gov") I ask because it sounds like that might be a slightly different way to deal with proxies: https://www.opencpu.org/posts/curl-release-0-9-2/ https://stackoverflow.com/questions/33538695/how-to-tell-r-to-use-proxy-auto-config-script-pac-in-windows

We can also set options in httr like this:

library(httr)
set_config(verbose())
set_config(progress())
daily <- readNWISdv("05427718", "00060")

I'm not sure if the "verbose" output would tell us any more information, worth a shot I guess.

I think another thing to try is some other examples to make sure they are working. Can you get all of these lines to work (but using your proxy information?)? https://gist.github.com/jeroen/5127c288f8914bdb20be

Here are some links I've been looking at. It's always tricky to help with proxy questions because we don't have the a proxy to work with. https://stackoverflow.com/questions/6467277/proxy-setting-for-r https://stackoverflow.com/questions/4832560/how-do-i-tell-the-r-interpreter-how-to-use-the-proxy-server https://github.com/jeroen/curl/issues/1

ldecicco-USGS avatar Dec 27 '23 17:12 ldecicco-USGS