CoordinateCleaner
CoordinateCleaner copied to clipboard
Filter for known defaults of coordinate uncertainty in meters
There are several known default values for coordinate uncertainty in meters.
301 : Geolocate Default (often a country centroid) 3036 : Geolocate Default 999 : Default found in a few datasets (observations.org) 9999 : Large default
occurrence counts 630 353 -- 3036m 401 507 -- 301m 370 553 -- 999m 14 242 -- 9999m
I think CoordinateCleaner could have a function for these filtering these known defaults. I would be happy to make a PR for such a function...
https://github.com/gbif/pipelines/issues/417
Hi John,
thanks for the excellent suggestion. I'll implement this for the next version. Two questions:
- What do you suggest as default name for the column with the uncertainty in meters, since this will be user provided
- My impression is that default values may also cause problems in other entry fields. For instance, the individualCount. What do you think about an option to flag those as well?
Thanks!!
I don't have any opinions about individualCount right now.
My assumption would be that there might be some default values there. GBIF has recently done a good job of trying to cleaning up that column. Since GBIF now has the occurrence_status field: https://www.gbif.org/occurrence/search?taxon_key=4689&occurrence_status=present
What do you suggest as default name for the column with the uncertainty in meters, since this will be user provided
I would name the issue or column something like "known_default_coordinate_uncertainty"