pointblank
pointblank copied to clipboard
Checks for sf-objects
I intended to check key-properties of sf(c)
-objects making use of rows_not_duplicated()
. The check was supposed to ignore the geometry column of the object (cf. 2nd example in reprex).
It seems that interrogate()
ran into an error, because of the way, summarize()
works on these objects.
Reprex example:
library(pointblank)
library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.1.3, PROJ 4.9.3
# Geometry object with 2 features
g <- rep(st_sfc(st_point(1:2)), 2)
# vector with 2 entries
v <- c("a", "b")
# object including both objects
mixed_obj <- st_sf("vector" = v, "points" = g)
mixed_obj
#> Simple feature collection with 2 features and 1 field
#> geometry type: POINT
#> dimension: XY
#> bbox: xmin: 1 ymin: 2 xmax: 1 ymax: 2
#> epsg (SRID): NA
#> proj4string: NA
#> vector points
#> 1 a POINT (1 2)
#> 2 b POINT (1 2)
agent <- create_agent()
agent %>%
focus_on("mixed_obj") %>%
rows_not_duplicated() %>%
interrogate()
#> Error: Can't coerce element 2 from a list to a double
# It already happens, when I only check if column "vector" is duplicated
# (likely because `sf`-objects have "sticky geometries")
agent <- create_agent()
agent %>%
focus_on("mixed_obj") %>%
rows_not_duplicated(cols = vector) %>%
interrogate()
#> Error: Can't coerce element 2 from a list to a double
Created on 2019-02-12 by the reprex package (v0.2.1)
I think it happens at the following chunk in interrogate()
in the section "# Judge tables on expectation of non-duplicated rows":
# Get total count of rows
row_count <-
table %>%
dplyr::group_by() %>%
dplyr::summarize(row_count = n()) %>%
dplyr::as_tibble() %>%
purrr::flatten_dbl()
My expectation would be, that
- in the first case of the reprex (
rows_not_duplicated()
, without specifying columns) each whole row, including the geometry column, would be compared with the others. - in the second case (
rows_not_duplicated(cols = vector)
) the check would be done only for the column "vector".
Perhaps a solution might be to call as_tibble()
before group_by()
and summarize()
?
CC: @krlmlr
Just following up on this with what seems to be a related issue.
scan_data
does not appear to work on sf data
Data on bus_stops downloaded from https://data.a2gov.org/feeds/GIS/AATA%20BusStops/AATA_Bus_Stops.shp.xml
It just appears to stop with
Error in sum(as.vector(t(collected))) : invalid 'type' (list) of argument
Example code (and error) below
R> library(sf) Linking to GEOS 3.8.1, GDAL 3.2.1, PROJ 7.2.1 R> bus_data <- sf::st_read('~/Downloads/AATABusStops/AATABusStops.shp') Reading layer
AATABusStops' from data source
/Users/peterhiggins/Downloads/AATABusStops/AATABusStops.shp' using driver
ESRI Shapefile'
Simple feature collection with 1616 features and 12 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -84.02867 ymin: 42.21356 xmax: -83.48754 ymax: 42.32714
Geodetic CRS: NAD83
R> pointblank::scan_data(bus_data)
── Data Scan started. Processing 6 sections. ─── ℹ Starting assembly of 'Overview' section... Error in sum(as.vector(t(collected))) : invalid 'type' (list) of argument R> class(bus_data) [1] "sf" "data.frame"`