rnaturalearth icon indicating copy to clipboard operation
rnaturalearth copied to clipboard

ne_download URL strange/wrong

Open rix133 opened this issue 5 years ago • 22 comments

So after clean install using the latest "rnaturalearth" on Windows 7 (R 3.5.3):

The URL seems strange to me: http://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_admin_0_countries.zip

Command that I ran: countries10 <- ne_download(scale = 10, type = 'countries', category = 'cultural', returnclass = "sf")

Error in utils::download.file(file.path(address), zip_file <- tempfile()) : cannot open URL 'http://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_admin_0_countries.zip'
2.
utils::download.file(file.path(address), zip_file <- tempfile())
1.
ne_download(scale = 10, type = "countries", category = "cultural", returnclass = "sf")

rix133 avatar Jun 19 '19 15:06 rix133

Is the link www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_admin_0_countries.zip works in your browser?

Nowosad avatar Jun 19 '19 15:06 Nowosad

The command works from R for me and as @Nowosad points out should work in browser too. (but you are right it does look a bit strange). Must be another download issue ?

andysouth avatar Jun 19 '19 15:06 andysouth

Is the link www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_admin_0_countries.zip works in your browser?

Actually it does, so it can't be a firewall or some other problem, I guess. So I have no idea ? I must investigate further. It fails with a clean install of 3.6.0 as well with the same message.

rix133 avatar Jun 19 '19 16:06 rix133

The code saves the downloaded file to a temporary location, might be worth checking that this works for you : write.csv(data.frame(), tempfile())

andysouth avatar Jun 19 '19 16:06 andysouth

Try this to check connection from R: library(curl) curl::has_internet()

andysouth avatar Jun 19 '19 16:06 andysouth

Try this to check connection from R: library(curl) curl::has_internet()

TRUE

write.csv(data.frame(), tempfile()) gives no error

Furthermore other urls work i.e: utils::download.file(file.path('https://file-examples.com/wp-content/uploads/2017/02/zip_2MB.zip'), zip_file <- tempfile())

yields: trying URL 'https://file-examples.com/wp-content/uploads/2017/02/zip_2MB.zip' Content type 'application/zip' length 2036861 bytes (1.9 MB) downloaded 1.9 MB

downloading using http instead of https of this URL works as well.

rix133 avatar Jun 20 '19 07:06 rix133

UPDATE: It works if I specify download method to in download to libcurl i.e: utils::download.file(file.path(address), zip_file <- tempfile(), method = "libcurl")

So I can specify global options for the ne_download :

options("download.file.method" = "libcurl")
countries10 <- ne_download(scale = 10, type = 'countries', category = 'cultural', returnclass = "sf")

yields

(trying URL 'http://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_admin_0_countries.zip'
Content length 283 bytes
downloaded 4.7 MB)

Should I close the issue?

rix133 avatar Jun 20 '19 08:06 rix133

Thanks Richard, Good work finding a solution. Leave it open for now while I ask twitter if it's wise to set that option in the package.

andysouth avatar Jun 20 '19 09:06 andysouth

download.file uses this code to decide on the method:

    method <- if (missing(method)) 
        getOption("download.file.method", default = "auto")
    else match.arg(method, c("auto", "internal", "libcurl", "wget", 
        "curl", "lynx"))
    if (method == "auto") {
        if (length(url) != 1L || typeof(url) != "character") 
            stop("'url' must be a length-one character vector")
        method <- if (grepl("^file:", url)) 
            "internal"
        else "libcurl"
    }

so if method isn't supplied and download.file.method isn't set then method gets the default of auto and then the method is "internal" if the URL starts with file: or libcurl otherwise. So if your download works with explicit method="libcurl" but not with a missing method argument then the default is being got from somewhere. Sure you haven't set the download.file.method to something else? Something that breaks on double-slashes in a URL? That's the only odd thing in that URL... Maybe it parses the string up to the second // instead of the first?

barryrowlingson avatar Jun 20 '19 09:06 barryrowlingson

Sure you haven't set the download.file.method to something else?

So the error appears both in Rstudio console and R console. The options("download.file.method") returns: NULL in R console "wininet"`` in latest RStudio

I looked at the method definition on R3.5.3 (windows) and it seems to default to "wininet" if auto: snippet from download.file

if (method == "auto") {
    if (length(url) != 1L || typeof(url) != "character") 
      stop("'url' must be a length-one character vector")
    method <- if (grepl("^ftps:", url) && capabilities("libcurl")) 
      "libcurl"
    else "wininet"
  }

rix133 avatar Jun 20 '19 10:06 rix133

Having the same issue on Linux, R version 4.0.3 with rnaturalearth version 0.1.0 and 0.2.0.

options("download.file.method")
$download.file.method
[1] "libcurl"

ne_download() returs weird link (no problem with internet connection, curl::has_internet() returns TRUE, the link does not work in the browser either w/o firewall, wordpress returns There has been a critical error on your website.)

ne_download(scale = 10, type = 'rivers_lake_centerlines', category = 'physical', destdir = destdir, load = FALSE)

rnaturalearth version 0.1.0 returns:

trying URL 'http://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/physical/ne_10m_rivers_lake_centerlines.zip'
Error in utils::download.file(file.path(address), zip_file <- tempfile()) : 
  cannot open URL 'http://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/physical/ne_10m_rivers_lake_centerlines.zip'

and rnaturalearth version 0.2.0 returns http status 500:

trying URL 'http://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/raster/GRAY_HR_SR.zip'
download failed
NULL
Warning message:
In utils::download.file(file.path(address), zip_file <- tempfile()) :
  cannot open URL 'http://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/raster/GRAY_HR_SR.zip': HTTP status was '500 Internal Server Error'

Any ideas ho to solve this yet?

petrpajdla avatar Dec 10 '20 12:12 petrpajdla

Same issue here

rmgriffin avatar Dec 30 '20 03:12 rmgriffin

Same issue with Windows 10 rnaturalearth 0.1.0. and r 4.3

trying URL 'http://www.naturalearthdata.com/http//www.naturalearthdata.com/download/50m/physical/ne_50m_coastline.zip'
Error in utils::download.file(file.path(address), zip_file <- tempfile()) : 
  cannot open URL 'http://www.naturalearthdata.com/http//www.naturalearthdata.com/download/50m/physical/ne_50m_coastline.zip'
In addition: Warning message:
In utils::download.file(file.path(address), zip_file <- tempfile()) :
  InternetOpenUrl failed: 'The operation timed out

Grelot avatar Jan 13 '21 19:01 Grelot

I believe is related to the 'http' being used, instead of 'https' in the ne_file_name function. Is there a way to change this locally so it works?

function (scale = 110, type = "countries", category = c("cultural", 
  "physical", "raster"), full_url = FALSE) 
{
  scale <- check_scale(scale)
  category <- match.arg(category)
  if (type %in% c("countries", "map_units", "map_subunits", 
    "sovereignty", "tiny_countries", "boundary_lines_land", 
    "pacific_groupings", "breakaway_disputed_areas", "boundary_lines_disputed_areas", 
    "boundary_lines_maritime_indicator")) {
    type <- paste0("admin_0_", type)
  }
  if (type == "states") 
    type <- "admin_1_states_provinces_lakes"
  if (category == "raster") {
    file_name <- paste0(type)
  }
  else {
    file_name <- paste0("ne_", scale, "m_", type)
  }
  if (full_url) 
    file_name <- paste0("http://www.naturalearthdata.com/http//", 
      "www.naturalearthdata.com/download/", scale, "m/", 
      category, "/", file_name, ".zip")
  return(file_name)
}

Edit: I changed that in the function and it's still not working. Can it be something related to firewall?

bienflorencia avatar Apr 28 '21 18:04 bienflorencia

It appears to be a new issue on the website. If you go directly to the website and browse, you get a link the same as in the R package (though https:// vs http://). If you click on the link it invokes an onclick event that goes through urchinTracker. If you copy the link into a browser it fails with a Wordpress problem. Possibly some issue with redirection?

Thus, the ne_download() function is currently broken in the R package. It seems that the natural earth data folk have been made aware of this, see here: https://github.com/nvkelso/natural-earth-vector/issues/528

jmarshallnz avatar May 06 '21 00:05 jmarshallnz

This should be fixed now.

nvkelso avatar May 07 '21 05:05 nvkelso

I just ran into this error with R 4.0.2, rnaturalearth 0.1.0 (https://github.com/ropensci/rnaturalearth/commit/543e3cbc2c913724ed66e742f0b8c38828ef1002) current version in the repository), Windows 10

urban_areas <- rnaturalearth::ne_download(scale = 'large', type = 'urban_areas', returnclass = 'sf')
#> Warning in utils::download.file(file.path(address), zip_file <- tempfile()):
#> InternetOpenUrl failed: 'The server name or address could not be resolved'
#> Error in utils::download.file(file.path(address), zip_file <- tempfile()): cannot open URL 'http://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_urban_areas.zip'

Created on 2021-09-24 by the reprex package (v2.0.0)

Session info
sessioninfo::session_info()
#> - Session info ---------------------------------------------------------------
#>  setting  value                       
#>  version  R version 4.0.2 (2020-06-22)
#>  os       Windows 10 x64              
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language (EN)                        
#>  collate  English_United States.1252  
#>  ctype    English_United States.1252  
#>  tz       America/Phoenix             
#>  date     2021-09-24                  
#> 
#> - Packages -------------------------------------------------------------------
#>  package       * version date       lib source        
#>  assertthat      0.2.1   2019-03-21 [1] CRAN (R 4.0.2)
#>  backports       1.2.1   2020-12-09 [1] CRAN (R 4.0.3)
#>  class           7.3-17  2020-04-26 [2] CRAN (R 4.0.2)
#>  classInt        0.4-3   2020-04-07 [1] CRAN (R 4.0.2)
#>  cli             2.3.1   2021-02-23 [1] CRAN (R 4.0.4)
#>  crayon          1.4.1   2021-02-08 [1] CRAN (R 4.0.2)
#>  DBI             1.1.1   2021-01-15 [1] CRAN (R 4.0.3)
#>  digest          0.6.27  2020-10-24 [1] CRAN (R 4.0.3)
#>  dplyr           1.0.4   2021-02-02 [1] CRAN (R 4.0.3)
#>  e1071           1.7-6   2021-03-18 [1] CRAN (R 4.0.4)
#>  ellipsis        0.3.1   2020-05-15 [1] CRAN (R 4.0.2)
#>  evaluate        0.14    2019-05-28 [1] CRAN (R 4.0.2)
#>  fansi           0.4.2   2021-01-15 [1] CRAN (R 4.0.3)
#>  fs              1.5.0   2020-07-31 [1] CRAN (R 4.0.3)
#>  generics        0.1.0   2020-10-31 [1] CRAN (R 4.0.3)
#>  glue            1.4.2   2020-08-27 [1] CRAN (R 4.0.2)
#>  highr           0.8     2019-03-20 [1] CRAN (R 4.0.2)
#>  htmltools       0.5.1.1 2021-01-22 [1] CRAN (R 4.0.3)
#>  KernSmooth      2.23-17 2020-04-26 [2] CRAN (R 4.0.2)
#>  knitr           1.31    2021-01-27 [1] CRAN (R 4.0.3)
#>  lattice         0.20-41 2020-04-02 [2] CRAN (R 4.0.2)
#>  lifecycle       1.0.0   2021-02-15 [1] CRAN (R 4.0.4)
#>  magrittr        2.0.1   2020-11-17 [1] CRAN (R 4.0.3)
#>  pillar          1.5.1   2021-03-05 [1] CRAN (R 4.0.4)
#>  pkgconfig       2.0.3   2019-09-22 [1] CRAN (R 4.0.2)
#>  proxy           0.4-25  2021-03-05 [1] CRAN (R 4.0.4)
#>  purrr           0.3.4   2020-04-17 [1] CRAN (R 4.0.2)
#>  R6              2.5.0   2020-10-28 [1] CRAN (R 4.0.3)
#>  Rcpp            1.0.7   2021-07-07 [1] CRAN (R 4.0.5)
#>  reprex          2.0.0   2021-04-02 [1] CRAN (R 4.0.5)
#>  rlang           0.4.10  2020-12-30 [1] CRAN (R 4.0.3)
#>  rmarkdown       2.7     2021-02-19 [1] CRAN (R 4.0.4)
#>  rnaturalearth   0.1.0   2017-03-21 [1] CRAN (R 4.0.5)
#>  sessioninfo     1.1.1   2018-11-05 [1] CRAN (R 4.0.2)
#>  sf              0.9-7   2021-01-06 [1] CRAN (R 4.0.4)
#>  sp              1.4-5   2021-01-10 [1] CRAN (R 4.0.3)
#>  stringi         1.5.3   2020-09-09 [1] CRAN (R 4.0.3)
#>  stringr         1.4.0   2019-02-10 [1] CRAN (R 4.0.2)
#>  styler          1.3.2   2020-02-23 [1] CRAN (R 4.0.2)
#>  tibble          3.1.0   2021-02-25 [1] CRAN (R 4.0.4)
#>  tidyselect      1.1.0   2020-05-11 [1] CRAN (R 4.0.2)
#>  units           0.7-1   2021-03-16 [1] CRAN (R 4.0.4)
#>  utf8            1.2.1   2021-03-12 [1] CRAN (R 4.0.5)
#>  vctrs           0.3.6   2020-12-17 [1] CRAN (R 4.0.3)
#>  withr           2.4.1   2021-01-26 [1] CRAN (R 4.0.3)
#>  xfun            0.20    2021-01-06 [1] CRAN (R 4.0.3)
#>  yaml            2.2.1   2020-02-01 [1] CRAN (R 4.0.2)
#> 
#> [1] C:/Users/David/Documents/lib/R
#> [2] C:/Program Files/R/R-4.0.2/library

dlebauer avatar Sep 24 '21 22:09 dlebauer

There's a GIST showing where to find the files on S3.

nvkelso avatar Sep 29 '21 16:09 nvkelso

@nvkelso should the package be updated to use the new urls? Or are the currently used URLs expected to come back online?

dlebauer avatar Sep 29 '21 17:09 dlebauer

For CI and build systems you should switch to the direct S3 URLs as AWS sponsors now via their public data program.

On Sep 29, 2021, at 10:23, David LeBauer @.***> wrote:

 @nvkelso should the package be updated to use the new urls? Or are the currently used URLs expected to come back online?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

nvkelso avatar Sep 29 '21 18:09 nvkelso

Encountering the same bug and I believe there is still an issue with the URLs on the website. I am able to manually download from the website, but this could be related to the onclick event mentioned earlier in the thread.

This code: ocean <- ne_download(type = 'ocean', scale = 'large', category = 'physical', returnclass='sf')

Returns:

trying URL 'http://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/physical/ne_10m_ocean.zip'
Error in utils::download.file(file.path(address), zip_file <- tempfile()) : 
  cannot open URL 'http://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/physical/ne_10m_ocean.zip'
In addition: Warning message:
In utils::download.file(file.path(address), zip_file <- tempfile()) :
  cannot open URL 'http://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/physical/ne_10m_ocean.zip': HTTP status was '500 Internal Server Error'

Further when I try to download directly using the following code (which works on other files):

filedir <- paste0(tempdir())
unlink(filedir, recursive = TRUE)
dir.create(filedir)
ocean_shp <- paste0('https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/110m/physical/ne_110m_ocean.zip')
download.file(url = ocean_shp, destfile = paste0(filedir, basename(ocean_shp)))
unzip(paste0(filedir,basename(ocean_shp)), exdir= filedir)
list.files(path = filedir)

I get a similar error:

trying URL 'https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/110m/physical/ne_110m_ocean.zip'
Error in download.file(url = ocean_shp, destfile = paste0(filedir, basename(ocean_shp))) : 
  cannot open URL 'https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/110m/physical/ne_110m_ocean.zip'
In addition: Warning message:
In download.file(url = ocean_shp, destfile = paste0(filedir, basename(ocean_shp))) :
  cannot open URL 'https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/110m/physical/ne_110m_ocean.zip': HTTP status was '500 Internal Server Error'

nmarchio avatar Feb 26 '22 01:02 nmarchio

I suspect this was during a rare maintenance window on the Natural Earth server. 500s are server errors. Downloading that link works for me today.

If you switch over to the S3 URLs than that is much less likely to affect you.

nvkelso avatar Feb 27 '22 17:02 nvkelso