CoordinateCleaner
CoordinateCleaner copied to clipboard
Bug: wrong record flagged in clean_coordinates using data.frame
I'm running into a weird issue when running clean_coordinates. Reproducible example below. I have a data set with an obvious outlier. The basic workflow is 1. Get data from GBIF, (1a) convert tibble to data.frame, 2. Remove NA coords using subset, 3. Run clean_coordinates.
If I skip step 1a (i.e., send a tibble to clean_coordinates instead of a data.frame), the outlier is correctly flagged. If I do step 1a, a record is flagged, but it is the wrong row. If I convert to data.frame AFTER subset, everything is fine. If use square brackets instead of subset.data.frame, everything is fine. I have verified that the data in all columns is identical after subsetting, regardless of whether I subset on the tibble or the data.frame.
R 3.63, macOS 10.15.7, rgbif 3.5.2, CoordinateCleaner 2.0.18, tibble 3.0.4.
library(rgbif)
library(CoordinateCleaner)
dat = occ_search(scientificName = "Sorex alpinus", limit=250)$data
dat_df = as.data.frame(dat)
dat_df_no_subset = dat_df_no_subset = as.data.frame(dat)
dat = subset(dat, !is.na(decimalLatitude))
dat_df = subset(dat_df, !is.na(decimalLatitude))
dat_df_no_subset = dat_df_no_subset[!is.na(dat_df_no_subset$decimalLatitude),]
## all of the data in all 3 tables is identical, as expected
all(mapply(identical, dat, dat_df))
all(mapply(identical, dat, dat_df_no_subset))
cl = clean_coordinates(dat, lon="decimalLongitude", lat="decimalLatitude", tests="outliers")
cl_df = clean_coordinates(dat_df, lon="decimalLongitude", lat="decimalLatitude", tests="outliers")
cl_conv_late = clean_coordinates(as.data.frame(dat), lon="decimalLongitude",
lat="decimalLatitude", tests="outliers")
cl_no_subset = clean_coordinates(as.data.frame(dat), lon="decimalLongitude",
lat="decimalLatitude", tests="outliers")
## different records are flagged, but only if converting to data frame before using subset
which(! cl$.summary)
which(! cl_df$.summary)
which(! cl_conv_late$.summary)
which(! cl_no_subset$.summary)