CoordinateCleaner icon indicating copy to clipboard operation
CoordinateCleaner copied to clipboard

Incorrect flagging of geographical outliers in multi-species datasets

Open ChrKoenig opened this issue 4 months ago • 2 comments

The test for geographical outliers produces inconsistent results for datasets with a single vs. multiple species. Apparently, incorrect flags are assigned in the multi-species datasets. I didn't have time to dig into the root cause of it, but it seems that the row indexes for the flags get mixed up somewhere along the way.

Here's a reproducible example for the genus Alouatta:

library(CoordinateCleaner)
library(rgbif)
library(dplyr)

# Download Alouatta records from GBIF
alouatta_data <- occ_search(
  scientificName = "Alouatta", 
  hasCoordinate = TRUE,
  fields = c("species", "decimalLongitude", "decimalLatitude"),
  limit = 10000
)$data

# Select relevant columns and remove records with missing coordinates
alouatta_clean <- alouatta_data %>%
  filter(!is.na(decimalLongitude) & !is.na(decimalLatitude)) %>% 
  distinct()

# Flag coordinates
flags_single <- alouatta_clean %>% filter(species == "Alouatta caraya") %>% CoordinateCleaner::clean_coordinates(tests = "outliers")
flags_mult <- alouatta_clean %>% CoordinateCleaner::clean_coordinates(tests = "outliers")


result single species:

dplyr::filter(flags_single, .summary == F) 

#         species        latitude        longitude .val  .otl .summary
# Alouatta caraya       -58.03303        -26.18056 TRUE FALSE    FALSE
# Alouatta caraya       -58.04886        -26.18373 TRUE FALSE    FALSE
# Alouatta caraya        35.07838       -106.66348 TRUE FALSE    FALSE
# Alouatta caraya        10.74601        -84.17884 TRUE FALSE    FALSE

--> four records

result multiple species:

dplyr::filter(flags_mult, .summary == F & species == "Alouatta caraya") 

#         species        latitude        longitude .val  .otl .summary
# Alouatta caraya       -27.01190        -59.44896 TRUE FALSE    FALSE
# Alouatta caraya       -28.53233        -57.13952 TRUE FALSE    FALSE
# Alouatta caraya       -28.56564        -59.25179 TRUE FALSE    FALSE
# Alouatta caraya       -19.44745        -57.07579 TRUE FALSE    FALSE
# Alouatta caraya       -28.56640        -59.26846 TRUE FALSE    FALSE
# Alouatta caraya       -28.57930        -59.24632 TRUE FALSE    FALSE
# Alouatta caraya       -24.97473        -60.88110 TRUE FALSE    FALSE
# Alouatta caraya       -28.40994        -57.18283 TRUE FALSE    FALSE

--> eight records, all well within core range, no overlap with single species flags

ChrKoenig avatar Oct 07 '24 11:10 ChrKoenig