CoordinateCleaner
CoordinateCleaner copied to clipboard
Incorrect flagging of geographical outliers in multi-species datasets
The test for geographical outliers produces inconsistent results for datasets with a single vs. multiple species. Apparently, incorrect flags are assigned in the multi-species datasets. I didn't have time to dig into the root cause of it, but it seems that the row indexes for the flags get mixed up somewhere along the way.
Here's a reproducible example for the genus Alouatta:
library(CoordinateCleaner)
library(rgbif)
library(dplyr)
# Download Alouatta records from GBIF
alouatta_data <- occ_search(
scientificName = "Alouatta",
hasCoordinate = TRUE,
fields = c("species", "decimalLongitude", "decimalLatitude"),
limit = 10000
)$data
# Select relevant columns and remove records with missing coordinates
alouatta_clean <- alouatta_data %>%
filter(!is.na(decimalLongitude) & !is.na(decimalLatitude)) %>%
distinct()
# Flag coordinates
flags_single <- alouatta_clean %>% filter(species == "Alouatta caraya") %>% CoordinateCleaner::clean_coordinates(tests = "outliers")
flags_mult <- alouatta_clean %>% CoordinateCleaner::clean_coordinates(tests = "outliers")
result single species:
dplyr::filter(flags_single, .summary == F)
# species latitude longitude .val .otl .summary
# Alouatta caraya -58.03303 -26.18056 TRUE FALSE FALSE
# Alouatta caraya -58.04886 -26.18373 TRUE FALSE FALSE
# Alouatta caraya 35.07838 -106.66348 TRUE FALSE FALSE
# Alouatta caraya 10.74601 -84.17884 TRUE FALSE FALSE
--> four records
result multiple species:
dplyr::filter(flags_mult, .summary == F & species == "Alouatta caraya")
# species latitude longitude .val .otl .summary
# Alouatta caraya -27.01190 -59.44896 TRUE FALSE FALSE
# Alouatta caraya -28.53233 -57.13952 TRUE FALSE FALSE
# Alouatta caraya -28.56564 -59.25179 TRUE FALSE FALSE
# Alouatta caraya -19.44745 -57.07579 TRUE FALSE FALSE
# Alouatta caraya -28.56640 -59.26846 TRUE FALSE FALSE
# Alouatta caraya -28.57930 -59.24632 TRUE FALSE FALSE
# Alouatta caraya -24.97473 -60.88110 TRUE FALSE FALSE
# Alouatta caraya -28.40994 -57.18283 TRUE FALSE FALSE
--> eight records, all well within core range, no overlap with single species flags