osmextract icon indicating copy to clipboard operation
osmextract copied to clipboard

[FEATURE] Download previous versions of the data from geofabrik

Open juanfonsecaLS1 opened this issue 1 year ago • 2 comments

Is your feature request related to a problem? Please describe. Sometimes is useful to have access to previous versions of the OSM data. Currently osmextract gets the most recent version from different providers.

Describe the solution you'd like oe_get could use a parameter to indicate if the user wants a previous version of the data from geofabrik. Alternatively, a vignette can be created explaining how to do it.

Describe alternatives you've considered This is some code that works for getting an arbitrary version of the pbf file of Colombia

library(osmextract)
#> Data (c) OpenStreetMap contributors, ODbL 1.0. https://www.openstreetmap.org/copyright.
#> Check the package website, https://docs.ropensci.org/osmextract/, for more details.
library(rvest)

col_match <- oe_match("Colombia",provider = "geofabrik")
#> The input place was matched with: Colombia

u <- dirname(col_match$url)
f <- basename(col_match$url)

id_files <- gsub("latest\\.osm\\.pbf",replacement = "",f)

files_table <- (read_html(u) |> html_table())[[1]]

head(files_table)
#> # A tibble: 6 × 5
#>   ``    Name                                `Last modified`    Size  Description
#>   <lgl> <chr>                               <chr>              <chr> <lgl>      
#> 1 NA    ""                                  ""                 ""    NA         
#> 2 NA    "Parent Directory"                  ""                 "-"   NA         
#> 3 NA    "argentina-140101-free.shp.zip"     "2018-04-27 06:55" "99M" NA         
#> 4 NA    "argentina-140101-free.shp.zip.md5" "2018-05-03 17:18" "64"  NA         
#> 5 NA    "argentina-140101.osm.pbf"          "2014-01-01 23:35" "58M" NA         
#> 6 NA    "argentina-150101-free.shp.zip"     "2018-04-27 06:51" "133… NA

available_versions <- files_table$Name[grep(paste0(id_files,"\\d{6}\\.osm\\.pbf$"),files_table$Name)]

head(available_versions)
#> [1] "colombia-140101.osm.pbf" "colombia-150101.osm.pbf"
#> [3] "colombia-160101.osm.pbf" "colombia-170101.osm.pbf"
#> [5] "colombia-180101.osm.pbf" "colombia-190101.osm.pbf"

net_old <- do.call(oe_read,
                   list(file_path = paste0(u,"/",available_versions[10]))
                   )
#> The chosen file was already detected in the download directory. Skip downloading.
#> The corresponding gpkg file was already detected. Skip vectortranslate operations.
#> Reading layer `lines' from data source 
#>   `C:\Users\...\Documents\OSMEXT_downloads\geofabrik_colombia-230101.gpkg' 
#>   using driver `GPKG'
#> Simple feature collection with 1087521 features and 9 fields
#> Geometry type: LINESTRING
#> Dimension:     XY
#> Bounding box:  xmin: -85.94982 ymin: -4.503316 xmax: -66.57158 ymax: 26.00379
#> Geodetic CRS:  WGS 84

Created on 2024-09-05 with reprex v2.1.1

Additional context Add any other context or screenshots about the feature request here.

juanfonsecaLS1 avatar Sep 05 '24 15:09 juanfonsecaLS1

👍

Robinlovelace avatar Sep 05 '24 15:09 Robinlovelace

Thank you very much for your suggestion! I think it's a nice and reasonable idea, I'll do my best to implement it as soon as possible (maybe as an additional argument to oe_match & parents).

agila5 avatar Sep 05 '24 16:09 agila5

👍 thanks!

Robinlovelace avatar Jan 20 '25 16:01 Robinlovelace