geofi icon indicating copy to clipboard operation
geofi copied to clipboard

Consider VRK:n rakennusten osoitetiedot ja äänestysalueet -data

Open muuankarski opened this issue 5 years ago • 2 comments

Väestörekisterikeskus publishes annually data containing all buildings in Finland. Data is zipped delimited file with .OPT-extension and has 3,6 million rows. It can be read and processed in R (slowly) with following code:

# 2019
library(dplyr)
library(sp)
library(sf)
tmpfile <- tempfile()
tmpdir <- tempdir()
download.file("https://www.avoindata.fi/data/dataset/cf9208dc-63a9-44a2-9312-bbd2c3952596/resource/ae13f168-e835-4412-8661-355ea6c4c468/download/suomi_osoitteet_2019-05-15.zip",
              destfile = tmpfile)
unzip(zipfile = tmpfile,
      exdir = tmpdir)

opt <- read.csv(glue::glue("{tmpdir}/Suomi_osoitteet_2019-05-15.OPT"), 
                sep = ";", 
                stringsAsFactors = FALSE, 
                header = FALSE)

names(opt) <- c("rakennustu","sijaintiku",
                "sijaintima","rakennusty",
                "CoordY","CoordX",
                "osoitenume", "katunimi_f",
                "katunimi_s", "katunumero",
                "postinumer", "vaalipiirikoodi",
                "vaalipiirinimi","tyhja",
                "idx", "date")
if (F){ # subsetting just to make conversions faster
opt_orig <- as_tibble(opt)
opt <- sample_n(opt_orig, size = 2000)
}

opt$katunimi_f <- iconv(opt$katunimi_f, from = "windows-1252", to = "UTF-8")
opt$katunimi_s <- iconv(opt$katunimi_s, from = "windows-1252", to = "UTF-8")
opt$katunumero <- iconv(opt$katunumero, from = "windows-1252", to = "UTF-8")
opt$vaalipiirinimi <- iconv(opt$vaalipiirinimi, from = "windows-1252", to = "UTF-8")

sp.data <- SpatialPointsDataFrame(opt[, c("CoordX", "CoordY")], 
                                  opt, 
                                  proj4string = CRS("+init=epsg:3067"))

# Project the spatial data to lat/lon
# sp.data <- spTransform(sp.data, CRS("+proj=longlat +datum=WGS84"))

shape <- st_as_sf(sp.data)

st_coordinates(shape)

# shape %>% select(rakennustu) %>% plot()

saveRDS(shape, file=paste0("./sf19_buildings.RDS"))

Any ideas how to incorporate this with geofi. It is useful for instance when geocoding sensitive addresses.

However, this would require a storage as the data should be preprocessed. Do you think this as a suitable data for geofi and should we create a data repo such as geofi_data?

muuankarski avatar Aug 07 '19 05:08 muuankarski