Incorrect iso country codes
Incorrect country codes are sometimes being used at the neighbourhood and locality placetypes.
-
DN: Used for records in Denmark: https://spelunker.whosonfirst.org/id/85633121/descendants/?exclude=nullisland&iso=dn -
TU: Used for records in Turkey: https://spelunker.whosonfirst.org/id/85632393/descendants/?exclude=nullisland&iso=tu -
KO: Used for records in South Korea: https://spelunker.whosonfirst.org/id/85632231/descendants/?exclude=nullisland&iso=ko -
KO: Used for records in Serbia: https://spelunker.whosonfirst.org/id/85633755/descendants/?exclude=nullisland&iso=ko -
XX: Used to house records in various countries, needs investigating.
Also ensure that all values are capitalized.
This will be a next up, as we're now bucketing records by country code.
Records in China with CH country codes (Switzerland):
https://spelunker.whosonfirst.org/id/85632695/descendants/?exclude=nullisland&iso=ch
Now that we're dealing with per-country repositories based on ISO codes, I'd like to prioritize this issue (and any other similar issue w country codes).
What is the best approach here? Simply changing the country code value, removing the record from it's current repo and moving into the correct repo? Or should this involve superseding into a new record? cc @thisisaaronland @nvkelso
I think updating the country code and moving to the appropriate repo. No ID change or superseding.
See also: https://github.com/whosonfirst-data/whosonfirst-data/issues/1642
More cases in Switzerland, where locality records should have country codes for China or Chile:
102017915
102027727
102017575
102027723
102017613
102017875
102017851
102017801
102027575
102017603
102017129
102018099
102017263
102018055
102017131
102017955
102018061
102017095
102027727 is an odd one... any idea how this got associated with Switzerland?
I did some analysis on this today and found at least 24149 records which might be affected by this issue. https://gist.github.com/missinglink/ebc6f77519af4cd7e230406102517a99
The script walks up the hierarchy looking for records where the wof:repo property changes.
I'm excluding multiple hierarchies (as these often change ISO codes), -1 and repos admin-xy and admin-xx, so the number might be slightly larger. FYI the list of repos in that gist are in a non-deterministic order.
I gave it a quick sanity check and it looks correct, a lot of them are between the Indian and Pakistani repos.. and Ukrainian and Russian repos.
It would be ideal if WOF consumers only needed to download the admin data for their target ISO code (along with the admin-xy data) in order to ensure that they have all the data they need to satisfy hierarchies references.
Let me know if there is anything else I can do to help.
Thanks for crafting that gist.. looks like 102027727 was added to the Switzerland repo because the source incorrectly attributed a CH (Switzerland) country code instead of CN (China).
For these ~25k records, the correct fix would be to update the iso and wof country codes for records to either XX if we see a dual hierarchy with multiple countries or to update the iso and wof country codes to the correct value for places that have the wrong country code (like the 102027727 example).
Some records in the Faroe Islands also have mismatching wof:country and iso:country codes - maintaining both "FO" (correct) and "FR" (incorrect).
Example: https://spelunker.whosonfirst.org/id/101873099/
Other's from #2084:
-
GP– consider if this should be in FR instead since it's an overseas region of France? If so move items. -
GF– consider if this should be in FR instead since it's an overseas region of France? If so move items. -
SJ– investigate -
UM– country code is valid for United States Minor Outlying Islands... but investigate where it should go