dataprep
dataprep copied to clipboard
clean_country() for countries belonging to UK are not recognized as country
clean_country() applied to England and Scotland throws NaN. I believe this would happen for all countries belonging to UK. It would be nice if the function recognices both cases: United Kingdom and England (for example) as different countries, depending on the input.
thanks for creating such an amazing library! :)
Hi! Thank you for your brilliant advice. You're right that we need to consider details of different counties! Also, if you are interested in, welcome to update what you like into country_data.tsv
and open a PR!
Hi just started looking at this project! it looks amazing @qidanrui ! :)
Btw this issue is because countries inside UK are not ISO countries (list here (wikipedia), you can see that Ireland is here but not northern one neither England). I saw that some similar issue is on this PHP repo umpirsky/country-list.
maybe an option in clean_country() would be nice ?
clean_country( include_non_iso = TRUE OR FALSE default FALSE)
in order to include the data from country_data.tsv
and from a new file country_non_iso_data.tsv
(with list of uk countries and maybe more if there is 🤔 ) as apparently ISO is the norm in all country lists and packages
Btw an other issue is that as these are not ISO countries but ISO "principal subdivisions of a country". The ISO codes are connected to the UK ones like GB-ENG for England (https://en.wikipedia.org/wiki/ISO_3166-2:GB) so we don't have proper values for àlpha-2 alpha-3 and numeric
columns (regex
neither but we can put country name).
I saw that NaN values are no problem in country_data.tsv but I guess the codes are strings with 2 or 3 len max