CoordinateCleaner icon indicating copy to clipboard operation
CoordinateCleaner copied to clipboard

Filter for known defaults of coordinate uncertainty in meters

Open jhnwllr opened this issue 4 years ago • 3 comments

There are several known default values for coordinate uncertainty in meters.

301 : Geolocate Default (often a country centroid) 3036 : Geolocate Default 999 : Default found in a few datasets (observations.org) 9999 : Large default

occurrence counts 630 353 -- 3036m 401 507 -- 301m 370 553 -- 999m 14 242 -- 9999m

I think CoordinateCleaner could have a function for these filtering these known defaults. I would be happy to make a PR for such a function...

https://github.com/gbif/pipelines/issues/417

jhnwllr avatar Nov 19 '20 10:11 jhnwllr

Hi John,

thanks for the excellent suggestion. I'll implement this for the next version. Two questions:

  • What do you suggest as default name for the column with the uncertainty in meters, since this will be user provided
  • My impression is that default values may also cause problems in other entry fields. For instance, the individualCount. What do you think about an option to flag those as well?

azizka avatar Nov 27 '20 17:11 azizka

Thanks!!

I don't have any opinions about individualCount right now.

My assumption would be that there might be some default values there. GBIF has recently done a good job of trying to cleaning up that column. Since GBIF now has the occurrence_status field: https://www.gbif.org/occurrence/search?taxon_key=4689&occurrence_status=present

jhnwllr avatar Dec 07 '20 13:12 jhnwllr

What do you suggest as default name for the column with the uncertainty in meters, since this will be user provided I would name the issue or column something like "known_default_coordinate_uncertainty"

jhnwllr avatar Dec 07 '20 13:12 jhnwllr