emoji
emoji copied to clipboard
ability to change skin tone and gender of all emoji's in a text
I would like to be able to allow users to choose their skin tone and gender and then any text submitted is automatically scanned for emoji's and where available the skin tone and gender is changed in the text.
Any hint how this could be achieved? Also would you be open to a PR to add such functionality?
We currently don't have gender or skin tone information stored in our data files. I think we would need that information for every emoji that has a gender/skin tone. Without that information it is not straight forward to do this.
Our data looks like this:
{
'\U0001F468\U0000200D\U0001F680': { # 👨🚀
'en' : ':man_astronaut:',
},
'\U0001F469\U0000200D\U0001F680': { # 👩🚀
'en' : ':woman_astronaut:',
},
'\U0001F469\U0001F3FF\U0000200D\U0001F680': { # 👩🏿🚀
'en' : ':woman_astronaut_dark_skin_tone:',
},
'\U0001F474\U0001F3FD': { # 👴🏽
'en' : ':old_man_medium_skin_tone:',
},
'\U0001F9D3\U0001F3FF': { # 🧓🏿
'en' : ':older_person_dark_skin_tone:',
},
'\U0001F475': { # 👵
'en' : ':old_woman:',
},
}
We could annotate it with gender and skin tone. I think we would need something like this:
{
'\U0001F468\U0000200D\U0001F680': { # 👨🚀
'en' : ':man_astronaut:',
'base' : 'astronaut',
'gender': 'male',
'skin_tone': 'none',
},
'\U0001F469\U0000200D\U0001F680': { # 👩🚀
'en' : ':woman_astronaut:',
'base' : 'astronaut',
'gender': 'female',
'skin_tone': 'none',
},
'\U0001F469\U0001F3FF\U0000200D\U0001F680': { # 👩🏿🚀
'en' : ':woman_astronaut_dark_skin_tone:',
'base' : 'astronaut',
'gender': 'female',
'skin_tone': 'dark',
},
'\U0001F474\U0001F3FD': { # 👴🏽
'en' : ':old_man_medium_skin_tone:',
'base' : 'old_person',
'gender': 'male',
'skin_tone': 'medium',
},
'\U0001F9D3\U0001F3FF': { # 🧓🏿
'en' : ':older_person_dark_skin_tone:',
'base' : 'old_person',
'gender': 'person',
'skin_tone': 'dark',
},
'\U0001F475': { # 👵
'en' : ':old_woman:',
'base' : 'old_person',
'gender': 'female',
'skin_tone': 'none',
},
}
(BTW apart from gender and skin tone there are also different hair styles: blond_hair, red_hair, curly_hair, bald, etc)
Here is an example as to why it is not straight forward:
Let's use :woman_detective_medium-light_skin_tone:
🕵🏼♀️
In Unicode it is:
1F575 1F3FC 200D 2640 FE0F
- 1F575: detective :detective:
- 1F3FC: medium-light skin tone
- 200D: zero-width-joiner (this implicates that the preceding and following should be displayed as one emoji)
- 2640: female sign
♀
- FE0F: emoji-presentation-selector (implicates that an emoji is shown instead of text e.g. 2640 is
♀
but 2640 FE0F is :female_sign:)
In this case we could simply replace the skin tone or the gender with our desired replacement. However it is also valid to have emoji without gender or skin tone, for example: :detective: 1F575 FE0F (detective + emoji-presentation-selector) or 🕵🏼 1F575 1F3FC (detective + medium-light skin tone). There are also different way to define the gender, not just :female_sign:/:male_sign:.
There are five skin tones: light_skin_tone/medium-light_skin_tone/medium_skin_tone/medium-dark_skin_tone/ dark_skin_tone An emoji by default is without skin tone and to some emoji the modifier can be added to select a skin tone.
There are three genders: male, female and person, but different ways to represent them:
- Some emoji are without gender and support a modifier
male_sign
:male_sign:/female_sign
:female_sign: (like the detective example) - Some emoji are without gender and support a modifier
man
👨/woman
👩 - Some emoji are without gender and support a modifier
man
👨/woman
👩/person
🧑 - Some emoji have a gender (the emoji gender cannot be modified), but there are different emoji for each gender:
prince
:prince: (unicode 1F934)/princess
:princess: (unicode 1F478) orold_man
👴(unicode 1F474)/old_woman
👵(unicode 1F475)/older_person
🧓(unicode 1F9D3)
The Unicode specs for all of this are here: http://www.unicode.org/reports/tr51/
There are also test files available from unicode.org that contain all recommended combinations of skin tone and gender: https://unicode.org/Public/emoji/14.0/emoji-sequences.txt https://unicode.org/Public/emoji/14.0/emoji-zwj-sequences.txt
Thank you for this analysis!
So the main challenge is basically building up this database to know which modifiers exist for which emoji?
Yes I believe that would be the only big obstacle.
Replacing of emojis can be done with the existing function emoji.replace_emoji
like so:
def replace_fct(match_str, match_data):
print(match_str)
# 🤴
print(match_data)
# {'en': ':prince:',
# 'status': 2,
# 'E': 3,
# 'de': ':prinz:',
# 'es': ':príncipe:',
# 'fr': ':prince:',
# 'pt': ':príncipe:',
# 'it': ':principe:',
# 'match_start': 11,
# 'match_end': 12}
#
# If there was gender information, there would also be something like:
# 'gender': 'male'
# 'base': 'prince/princess'
#
# Here you could search through all emoji to find a replacement emoji
# for emj, data in emoji.EMOJI_DATA.items():
# # find emoji with same 'base' and desired gender
# if data["base"] == match_data["base"] and data["gender"] == "female":
# return emj
return "👸"
emoji.replace_emoji("This emoji 🤴 is a prince(ss)", replace=replace_fct)
I'll work on this. I might put it in an extra package/library though, since it is probably a very niche thing
thank you!