disk.frame icon indicating copy to clipboard operation
disk.frame copied to clipboard

support sfheaders

Open xiaodaigh opened this issue 6 years ago • 1 comments

https://github.com/dcooley/sfheaders

https://github.com/dcooley/sfheaders/issues/40

xiaodaigh avatar Jan 06 '20 23:01 xiaodaigh

I had a look at an example because you made me think of scanning through a vector source without having the entire sf object in memory, this uses the virtual FID field from GDAL to read a feature at a time:

  library(disk.frame)

df_path <- file.path(tempdir(), "disk_frame_sf")
diskf <- disk.frame(df_path)

sfsrc <- system.file("gpkg/nc.gpkg", package = "sf", mustWork = TRUE)
(layer <- sf::st_layers(sfsrc)$name[1L])
#> [1] "nc.gpkg"
## find out how many features and what the first FID is (it varies)
cnt <- sf::read_sf(sfsrc, query = sprintf("SELECT MIN(FID) AS minfid, COUNT(*) AS n_features FROM [%s]", layer))
#> Warning: no simple feature geometries present: returning a data.frame or
#> tbl_df
offset <- if (cnt$minfid == 0) 1 else 0
## scan a feature at a time
for (i in seq_len(cnt$n_features) ) {
  sf0 <- sf::read_sf(sfsrc, query = sprintf("SELECT * FROM [%s] WHERE FID == %i", 
                                              layer, i - offset))
  add_chunk(diskf, sfheaders::sf_to_df(sf0))
}

diskf
#> path: "/tmp/RtmpIC4Ica/disk_frame_sf"
#> nchunks: 100
#> nrow (at source): 2529
#> ncol (at source): 6
#> nrow (post operations): ???
#> ncol (post operations): ???

Created on 2020-03-14 by the reprex package (v0.3.0)

I wonder what kind of workflows you are envisioning?

There's a lot of other options to this, the query for layer and FID is awkward via SQL, but with vapour (for example) we can scan/skip over geometries or attributes arbitrarily (still need sf to convert from binary).

mdsumner avatar Mar 14 '20 11:03 mdsumner