countrycode icon indicating copy to clipboard operation
countrycode copied to clipboard

Micronesia regexes

Open mattkerlogue opened this issue 4 months ago • 5 comments

Related to #289, I've recently been working with a table that has Micronesia (the country) listed solely as "Micronesia" not "Federated States of Micronesia" and thus countrycode returns an NA value.

I noticed in the discussion at #289 a reference to making a distinction between the subregion and the country, however on further inspecting the codelist dataset this seems to only be applied in the case of the English regex, while the French, German and Italian regexes only test for the name of subregion.

I've certainly seen datasets where the country is just referred to as Micronesia, but I've also seen it abbreviated as "FS Micronesia" or "F.S. Micronesia" which the current English regex would also miss. Moreover, country.name.de is simply a reference to the subregion "Mikronesien" rather than the full country name (e.g. "Mikronesien (Föderierten Staaten von)").

countrycode::codelist |>
  dplyr::filter(iso3c == "FSM") |>
  dplyr::select(
    country.name.en, country.name.fr, country.name.de, country.name.it,
    country.name.en.regex, country.name.fr.regex,
    country.name.de.regex, country.name.it.regex) |>
  dplyr::glimpse()

#>  Rows: 1
#>  Columns: 8
#>  $ country.name.en       <chr> "Micronesia (Federated States of)"
#>  $ country.name.fr       <chr> "Micronésie (États fédérés de)"
#>  $ country.name.de       <chr> "Mikronesien"
#>  $ country.name.it       <chr> NA
#>  $ country.name.en.regex <chr> "fed.*micronesia|micronesia.*fed"
#>  $ country.name.fr.regex <chr> "micron(é|e)sie"
#>  $ country.name.de.regex <chr> "mikronesien"
#>  $ country.name.it.regex <chr> "micronesia"

In my personal experience it's rare that I've come across lists/situations which include continents/continental subregions alongside countries, and if they do I'd ordinarily remove those from a list before trying to use countrycode() on it. So it did surprise me that "Micronesia" didn't return a country code.

Given that "Micronesia" is the only geographic term that can so closely be attributed to either a country or region my expectation would be that it would return the country code rather than return an NA.

mattkerlogue avatar Feb 15 '24 16:02 mattkerlogue