disk.frame
disk.frame copied to clipboard
support sfheaders
https://github.com/dcooley/sfheaders
https://github.com/dcooley/sfheaders/issues/40
I had a look at an example because you made me think of scanning through a vector source without having the entire sf object in memory, this uses the virtual FID field from GDAL to read a feature at a time:
library(disk.frame)
df_path <- file.path(tempdir(), "disk_frame_sf")
diskf <- disk.frame(df_path)
sfsrc <- system.file("gpkg/nc.gpkg", package = "sf", mustWork = TRUE)
(layer <- sf::st_layers(sfsrc)$name[1L])
#> [1] "nc.gpkg"
## find out how many features and what the first FID is (it varies)
cnt <- sf::read_sf(sfsrc, query = sprintf("SELECT MIN(FID) AS minfid, COUNT(*) AS n_features FROM [%s]", layer))
#> Warning: no simple feature geometries present: returning a data.frame or
#> tbl_df
offset <- if (cnt$minfid == 0) 1 else 0
## scan a feature at a time
for (i in seq_len(cnt$n_features) ) {
sf0 <- sf::read_sf(sfsrc, query = sprintf("SELECT * FROM [%s] WHERE FID == %i",
layer, i - offset))
add_chunk(diskf, sfheaders::sf_to_df(sf0))
}
diskf
#> path: "/tmp/RtmpIC4Ica/disk_frame_sf"
#> nchunks: 100
#> nrow (at source): 2529
#> ncol (at source): 6
#> nrow (post operations): ???
#> ncol (post operations): ???
Created on 2020-03-14 by the reprex package (v0.3.0)
I wonder what kind of workflows you are envisioning?
There's a lot of other options to this, the query for layer and FID is awkward via SQL, but with vapour (for example) we can scan/skip over geometries or attributes arbitrarily (still need sf to convert from binary).