emoji
emoji copied to clipboard
Hexcode compatible with OpenMoji
def hexcode(emoji):
codes = [hex(ord(e))[2:].upper() for e in emoji]
return "-".join(codes)
Can we add hexcode to emoji.EMOJI_DATA
?
https://github.com/hfg-gmuend/openmoji are indentied by their hexcode.
I use the emoji package in production for Subreply. I intend to add OpenMoji as soon as they are production ready.
Does that mean you would like to replace a :emoji:
with something like <img src="openmoji/HEXCODE.png">
?
I wonder if we should hardcode the hex codes in the EMOJI_DATA dict. They are very easy to generate on runtime with your function, so maybe generating them on runtime makes more sense. I see they have a JSON file with all their emoji at https://github.com/hfg-gmuend/openmoji/blob/master/data/openmoji.json The question is how similar are our emoji to the OpenMoji data. Will the hexcode() function work for every emoji or are there some emoji that need adjustment or need to be matched by hand. Especially emoji that contain invisible characters or modifiers like skin color could be a problem.
That's what I mean. emoji.hexcode(string)
is a better implementation indeed. A test can be run on openmoji.json
.
I created a script to test it:
https://replit.com/@cuzi/emoji-to-Openmoji#main.py
print("############## main.py ###############")
import emoji
import requests
import html
def hexcode(emoji):
# rjust(4, '0') is necessary to convert "2A" to "002A"
codes = [hex(ord(e))[2:].upper().rjust(4, '0') for e in emoji]
return "-".join(codes)
# Try to match all emoji from EMOJI_DATA to Openmoji:
openmoji = requests.get("https://github.com/hfg-gmuend/openmoji/raw/master/data/openmoji.json").json()
hexToOpenmoji = {value["hexcode"]: value for value in openmoji}
emojiToOpenmoji = {}
print("Following emoji couldn't be found in Openmoji:")
for emj in emoji.EMOJI_DATA:
found = False
if hexcode(emj) in hexToOpenmoji:
emojiToOpenmoji[emj] = hexToOpenmoji[hexcode(emj)]
found = True
elif emj[-1] == '\ufe0f':
# Remove the emoji variant u+fe0f and try again
emj_no_variant = emj[0:-1]
if hexcode(emj_no_variant) in hexToOpenmoji:
emojiToOpenmoji[emj] = hexToOpenmoji[hexcode(emj_no_variant)]
found = True
else:
# Append the emoji variant u+fe0f and try again
emj_emoji_variant = emj + '\ufe0f'
if hexcode(emj_emoji_variant) in hexToOpenmoji:
emojiToOpenmoji[emj] = hexToOpenmoji[hexcode(emj_emoji_variant)]
found = True
if not found and emoji.EMOJI_DATA[emj]['status'] == emoji.STATUS['fully_qualified']:
print(f"E{emoji.EMOJI_DATA[emj]['E']} {emoji.EMOJI_DATA[emj]['en']} {hexcode(emj)} {emj}")
print("###########################")
def replace_fct(emj, emj_data):
if emj in emojiToOpenmoji:
alt = html.escape(emj)
title = html.escape(emj_data['en'])
src = html.escape(emojiToOpenmoji[emj]["hexcode"]) + ".svg"
return f'<img src="{src}" alt="{alt}" title="{title}">'
else:
return "Unsupported emoji"
print(emoji.emojize("a lion in html: :lion:", version=-1, handle_version=replace_fct))
For some emoji it is necessary to remove the variant indicator U+FE0F
or add it to find the emoji.
With that modification it can match all emoji that are fully-qualified by Unicode except for the newest emojis.
The script lists all emoji it cannot match and they are all part of Unicode 14.0/E14 wich Openmoji doesn't include yet (hfg-gmuend/openmoji#344)
So generating it on runtime is definitely an option instead of hard-coding it.