ety-python
ety-python copied to clipboard
Add emoji flags to languages
As mentioned on #25
Command line interface could have an -e
flag for displaying relevant emojis alongside languages and maybe words
Very low priority feature, but might be fun to implement and use
Reference: in Unicode the characters used to display flags are the code point for the capital letter, plus 127397. [source]
So chr(ord("A") + 127397)
= 🇦
>>> chr(ord("G") + 127397) + chr(ord("B") + 127397)
'🇬🇧'
>>> chr(ord("F") + 127397) + chr(ord("R") + 127397)
'🇫🇷'
The iso-639-3.json
already has these country codes, so could be a feature of the Langauge
class.
>>> import ety
>>> fr = ety.Language("fra")
>>> fr.emoji
'🇫🇷'
{
"name": "French",
"type": "living",
"scope": "individual",
"iso6393": "fra",
"iso6392B": "fre",
"iso6392T": "fra",
"iso6391": "fr"
}
Adding 127397 to each code letter is a neat trick (I'd never seen it before!), but there's a bit of a problem here.
ISO 3166-1 alpha-2 is for countries, and is the code used for mapping flags.
ISO 639 is for languages.
It's fine for many, like French, but there are a few which don't have the same code in each ISO.
And it's a bit tricky to choose a flag for a language, as some countries use many languages, and some languages are used by many countries. (This is also a problem in UX, see for example http://www.flagsarenotlanguages.com/blog/why-flags-do-not-represent-language/)
Demo
iso-639-3.json
now only contains 3-char language codes (ISO 639-3, eg. "fra") and no longer contains the 2-char codes (ISO 639-2, eg. "fr"), so using the pycountry library (pip install pycountry
) to get the ISO 639-2 from the ISO 639-3, and then assume it's ISO 3166-1 alpha-2 and returning the flag:
import pycountry
...
class Language(object):
...
@property
def emoji(self):
try:
alpha_2 = pycountry.languages.get(alpha_3=self.iso).alpha_2.upper()
print(alpha_2)
return chr(ord(alpha_2[0]) + 127397) + chr(ord(alpha_2[1]) + 127397)
except AttributeError:
return None
Then running this:
import ety
from ety.data import langs
for code in langs:
lang = ety.Language(code)
if lang.emoji is not None:
print(lang.emoji, lang)
Gives:
🇦🇦 Afar
🇦🇧 Abkhazian
🇦🇫 Afrikaans
🇦🇰 Akan
🇦🇲 Amharic
🇦🇷 Arabic
🇦🇳 Aragonese
🇦🇸 Assamese
🇦🇻 Avaric
🇦🇪 Avestan
🇦🇾 Aymara
🇦🇿 Azerbaijani
🇧🇦 Bashkir
🇧🇲 Bambara
🇧🇪 Belarusian
🇧🇳 Bengali
🇧🇮 Bislama
🇧🇴 Tibetan
🇧🇸 Bosnian
🇧🇷 Breton
🇧🇬 Bulgarian
🇨🇦 Catalan
🇨🇸 Czech
🇨🇠Chamorro
🇨🇪 Chechen
🇨🇺 Church Slavic
🇨🇻 Chuvash
🇰🇼 Cornish
🇨🇴 Corsican
🇨🇷 Cree
🇨🇾 Welsh
🇩🇦 Danish
🇩🇪 German
🇩🇻 Dhivehi
🇩🇿 Dzongkha
🇪🇱 Modern Greek (1453-)
🇪🇳 English
🇪🇴 Esperanto
🇪🇹 Estonian
🇪🇺 Basque
🇪🇪 Ewe
🇫🇴 Faroese
🇫🇦 Persian
🇫🇯 Fijian
🇫🇮 Finnish
🇫🇷 French
🇫🇾 Western Frisian
🇫🇫 Fulah
🇬🇩 Scottish Gaelic
🇬🇦 Irish
🇬🇱 Galician
🇬🇻 Manx
🇬🇳 Guarani
🇬🇺 Gujarati
ðŸ‡ðŸ‡¹ Haitian
ðŸ‡ðŸ‡¦ Hausa
🇸🇠Serbo-Croatian
ðŸ‡ðŸ‡ª Hebrew
ðŸ‡ðŸ‡¿ Herero
ðŸ‡ðŸ‡® Hindi
ðŸ‡ðŸ‡´ Hiri Motu
ðŸ‡ðŸ‡· Croatian
ðŸ‡ðŸ‡º Hungarian
ðŸ‡ðŸ‡¾ Armenian
🇮🇬 Igbo
🇮🇴 Ido
🇮🇮 Sichuan Yi
🇮🇺 Inuktitut
🇮🇪 Interlingue
🇮🇦 Interlingua (International Auxiliary Language Association)
🇮🇩 Indonesian
🇮🇰 Inupiaq
🇮🇸 Icelandic
🇮🇹 Italian
🇯🇻 Javanese
🇯🇦 Japanese
🇰🇱 Kalaallisut
🇰🇳 Kannada
🇰🇸 Kashmiri
🇰🇦 Georgian
🇰🇷 Kanuri
🇰🇰 Kazakh
🇰🇲 Khmer
🇰🇮 Kikuyu
🇷🇼 Kinyarwanda
🇰🇾 Kirghiz
🇰🇻 Komi
🇰🇬 Kongo
🇰🇴 Korean
🇰🇯 Kuanyama
🇰🇺 Kurdish
🇱🇴 Lao
🇱🇦 Latin
🇱🇻 Latvian
🇱🇮 Limburgan
🇱🇳 Lingala
🇱🇹 Lithuanian
🇱🇧 Luxembourgish
🇱🇺 Luba-Katanga
🇱🇬 Ganda
🇲🇠Marshallese
🇲🇱 Malayalam
🇲🇷 Marathi
🇲🇰 Macedonian
🇲🇬 Malagasy
🇲🇹 Maltese
🇲🇳 Mongolian
🇲🇮 Maori
🇲🇸 Malay (macrolanguage)
🇲🇾 Burmese
🇳🇦 Nauru
🇳🇻 Navajo
🇳🇷 South Ndebele
🇳🇩 North Ndebele
🇳🇬 Ndonga
🇳🇪 Nepali (macrolanguage)
🇳🇱 Dutch
🇳🇳 Norwegian Nynorsk
🇳🇧 Norwegian Bokmål
🇳🇴 Norwegian
🇳🇾 Nyanja
🇴🇨 Occitan (post 1500)
🇴🇯 Ojibwa
🇴🇷 Oriya (macrolanguage)
🇴🇲 Oromo
🇴🇸 Ossetian
🇵🇦 Panjabi
🇵🇮 Pali
🇵🇱 Polish
🇵🇹 Portuguese
🇵🇸 Pushto
🇶🇺 Quechua
🇷🇲 Romansh
🇷🇴 Romanian
🇷🇳 Rundi
🇷🇺 Russian
🇸🇬 Sango
🇸🇦 Sanskrit
🇸🇮 Sinhala
🇸🇰 Slovak
🇸🇱 Slovenian
🇸🇪 Northern Sami
🇸🇲 Samoan
🇸🇳 Shona
🇸🇩 Sindhi
🇸🇴 Somali
🇸🇹 Southern Sotho
🇪🇸 Spanish
🇸🇶 Albanian
🇸🇨 Sardinian
🇸🇷 Serbian
🇸🇸 Swati
🇸🇺 Sundanese
🇸🇼 Swahili (macrolanguage)
🇸🇻 Swedish
🇹🇾 Tahitian
🇹🇦 Tamil
🇹🇹 Tatar
🇹🇪 Telugu
🇹🇬 Tajik
🇹🇱 Tagalog
🇹🇠Thai
🇹🇮 Tigrinya
🇹🇴 Tonga (Tonga Islands)
🇹🇳 Tswana
🇹🇸 Tsonga
🇹🇰 Turkmen
🇹🇷 Turkish
🇹🇼 Twi
🇺🇬 Uighur
🇺🇰 Ukrainian
🇺🇷 Urdu
🇺🇿 Uzbek
🇻🇪 Venda
🇻🇮 Vietnamese
🇻🇴 Volapük
🇼🇦 Walloon
🇼🇴 Wolof
🇽🇠Xhosa
🇾🇮 Yiddish
🇾🇴 Yoruba
🇿🇦 Zhuang
🇿🇠Chinese
🇿🇺 Zulu
Some clear mismatches:
🇦🇫 Afrikaans
🇦🇷 Arabic
🇧🇪 Belarusian
🇧🇷 Breton
🇨🇦 Catalan
🇨🇠Chamorro
🇰🇼 Cornish
🇨🇾 Welsh
🇪🇪 Ewe
🇮🇪 Interlingue
🇸🇻 Swedish
Are you sure they're mismatches? A few of those just look like they country code is derived from their native languages - I come from Devon, UK (next to Cornwall) so the first thing I noticed was that Cornish for 'Cornwall' is 'Kernow', which probably explains its 🇰🇼
code. Similarly, Welsh is 'Cymraeg' or something in Welsh - should explain its 🇨🇾
code.
Looking further into it, these seem to all be the two-char ISO 639-1
codes, rather than the three-char ISO 639-3
codes used by this library.
so tl;dr: your code looks good to me! feel free to PR it with a CLI arg to enable it!
I'm sure they're mismatches. Languages != countries.
"🇰🇼 Cornish"
That is not the Cornish flag, it's the flag of Kuwait.
Language | ISO 639-3 alpha-3 language code | ISO 639-3 alpha-2 language code | Flag |
---|---|---|---|
Cornish | cor | kw | ![]() |
Country | ISO 3166-1 alpha-2 country code | Flag |
---|---|---|
Kuwait | KW | ![]() |
"🇨🇾 Welsh"
That is not the Welsh flag, it's the flag of Cyprus.
Language | ISO 639-3 alpha-3 language code | ISO 639-3 alpha-2 language code | Flag |
---|---|---|---|
Welsh | cym | cy | ![]() |
Country | ISO 3166-1 alpha-2 country code | Flag |
---|---|---|
Cyprus | CY | ![]() |
"🇸🇻 Swedish"
That is not the Swedish flag, it's the flag of El Salvador.
Language | ISO 639-3 alpha-3 language code | ISO 639-3 alpha-2 language code | Flag |
---|---|---|---|
Swedish | swe | sv | ![]() |
Country | ISO 3166-1 alpha-2 country code | Flag |
---|---|---|
El Salvador | SV | ![]() |
Oops sorry my mistake, you're right - for some reason I'm seeing different things on different devices: Chrome on my laptop displays letters that seem to map to ISO 639-1s and on Chrome on my phone I can see the wrong flags you mentioned 🤔
Maybe there's a free dataset somewhere mapping ISO 639-3 codes to flag emojis we could use?