whosonfirst-data icon indicating copy to clipboard operation
whosonfirst-data copied to clipboard

Incorrect iso country codes

Open stepps00 opened this issue 7 years ago • 12 comments

Incorrect country codes are sometimes being used at the neighbourhood and locality placetypes.

  • DN: Used for records in Denmark: https://spelunker.whosonfirst.org/id/85633121/descendants/?exclude=nullisland&iso=dn
  • TU: Used for records in Turkey: https://spelunker.whosonfirst.org/id/85632393/descendants/?exclude=nullisland&iso=tu
  • KO: Used for records in South Korea: https://spelunker.whosonfirst.org/id/85632231/descendants/?exclude=nullisland&iso=ko
  • KO: Used for records in Serbia: https://spelunker.whosonfirst.org/id/85633755/descendants/?exclude=nullisland&iso=ko
  • XX: Used to house records in various countries, needs investigating.

Also ensure that all values are capitalized.

stepps00 avatar Jan 09 '19 22:01 stepps00

This will be a next up, as we're now bucketing records by country code.

stepps00 avatar May 07 '19 22:05 stepps00

Records in China with CH country codes (Switzerland): https://spelunker.whosonfirst.org/id/85632695/descendants/?exclude=nullisland&iso=ch

stepps00 avatar May 07 '19 22:05 stepps00

Now that we're dealing with per-country repositories based on ISO codes, I'd like to prioritize this issue (and any other similar issue w country codes).

What is the best approach here? Simply changing the country code value, removing the record from it's current repo and moving into the correct repo? Or should this involve superseding into a new record? cc @thisisaaronland @nvkelso

stepps00 avatar May 20 '19 22:05 stepps00

I think updating the country code and moving to the appropriate repo. No ID change or superseding.

nvkelso avatar May 20 '19 22:05 nvkelso

See also: https://github.com/whosonfirst-data/whosonfirst-data/issues/1642

stepps00 avatar Jul 01 '19 22:07 stepps00

More cases in Switzerland, where locality records should have country codes for China or Chile:

102017915
102027727
102017575
102027723
102017613
102017875
102017851
102017801
102027575
102017603
102017129
102018099
102017263
102018055
102017131
102017955
102018061
102017095

stepps00 avatar Aug 02 '19 20:08 stepps00

102027727 is an odd one... any idea how this got associated with Switzerland?

missinglink avatar Aug 05 '19 10:08 missinglink

I did some analysis on this today and found at least 24149 records which might be affected by this issue. https://gist.github.com/missinglink/ebc6f77519af4cd7e230406102517a99

The script walks up the hierarchy looking for records where the wof:repo property changes.

I'm excluding multiple hierarchies (as these often change ISO codes), -1 and repos admin-xy and admin-xx, so the number might be slightly larger. FYI the list of repos in that gist are in a non-deterministic order.

I gave it a quick sanity check and it looks correct, a lot of them are between the Indian and Pakistani repos.. and Ukrainian and Russian repos.

missinglink avatar Aug 05 '19 11:08 missinglink

It would be ideal if WOF consumers only needed to download the admin data for their target ISO code (along with the admin-xy data) in order to ensure that they have all the data they need to satisfy hierarchies references.

Let me know if there is anything else I can do to help.

missinglink avatar Aug 05 '19 11:08 missinglink

Thanks for crafting that gist.. looks like 102027727 was added to the Switzerland repo because the source incorrectly attributed a CH (Switzerland) country code instead of CN (China).

For these ~25k records, the correct fix would be to update the iso and wof country codes for records to either XX if we see a dual hierarchy with multiple countries or to update the iso and wof country codes to the correct value for places that have the wrong country code (like the 102027727 example).

stepps00 avatar Aug 05 '19 16:08 stepps00

Some records in the Faroe Islands also have mismatching wof:country and iso:country codes - maintaining both "FO" (correct) and "FR" (incorrect).

Example: https://spelunker.whosonfirst.org/id/101873099/

stepps00 avatar Nov 09 '19 00:11 stepps00

Other's from #2084:

  • GP – consider if this should be in FR instead since it's an overseas region of France? If so move items.
  • GF – consider if this should be in FR instead since it's an overseas region of France? If so move items.
  • SJ – investigate
  • UM – country code is valid for United States Minor Outlying Islands... but investigate where it should go

nvkelso avatar Apr 20 '23 20:04 nvkelso