emoji icon indicating copy to clipboard operation
emoji copied to clipboard

Hexcode compatible with OpenMoji

Open lucianmarin opened this issue 3 years ago • 3 comments

def hexcode(emoji):
  codes = [hex(ord(e))[2:].upper() for e in emoji]
  return "-".join(codes)

Can we add hexcode to emoji.EMOJI_DATA?

https://github.com/hfg-gmuend/openmoji are indentied by their hexcode.

I use the emoji package in production for Subreply. I intend to add OpenMoji as soon as they are production ready.

lucianmarin avatar Oct 09 '21 15:10 lucianmarin

Does that mean you would like to replace a :emoji: with something like <img src="openmoji/HEXCODE.png">?

I wonder if we should hardcode the hex codes in the EMOJI_DATA dict. They are very easy to generate on runtime with your function, so maybe generating them on runtime makes more sense. I see they have a JSON file with all their emoji at https://github.com/hfg-gmuend/openmoji/blob/master/data/openmoji.json The question is how similar are our emoji to the OpenMoji data. Will the hexcode() function work for every emoji or are there some emoji that need adjustment or need to be matched by hand. Especially emoji that contain invisible characters or modifiers like skin color could be a problem.

cvzi avatar Oct 13 '21 18:10 cvzi

That's what I mean. emoji.hexcode(string) is a better implementation indeed. A test can be run on openmoji.json.

lucianmarin avatar Oct 13 '21 19:10 lucianmarin

I created a script to test it: https://replit.com/@cuzi/emoji-to-Openmoji#main.py

main.py
print("############## main.py ###############")
import emoji
import requests
import html

def hexcode(emoji):
    #  rjust(4, '0') is necessary to convert "2A" to "002A"
    codes = [hex(ord(e))[2:].upper().rjust(4, '0') for e in emoji]
    return "-".join(codes)

# Try to match all emoji from EMOJI_DATA to Openmoji:
openmoji = requests.get("https://github.com/hfg-gmuend/openmoji/raw/master/data/openmoji.json").json()
hexToOpenmoji = {value["hexcode"]: value for value in openmoji}
emojiToOpenmoji = {}
print("Following emoji couldn't be found in Openmoji:")
for emj in emoji.EMOJI_DATA:
    found = False
    if hexcode(emj) in hexToOpenmoji:
        emojiToOpenmoji[emj] = hexToOpenmoji[hexcode(emj)]
        found = True
    elif emj[-1] == '\ufe0f':
         # Remove the emoji variant u+fe0f and try again
        emj_no_variant = emj[0:-1]
        if hexcode(emj_no_variant) in hexToOpenmoji:
            emojiToOpenmoji[emj] = hexToOpenmoji[hexcode(emj_no_variant)]
            found = True
    else:
         # Append the emoji variant u+fe0f and try again
        emj_emoji_variant = emj + '\ufe0f'
        if hexcode(emj_emoji_variant) in hexToOpenmoji:
            emojiToOpenmoji[emj] = hexToOpenmoji[hexcode(emj_emoji_variant)]
            found = True

    if not found and emoji.EMOJI_DATA[emj]['status'] == emoji.STATUS['fully_qualified']:
        print(f"E{emoji.EMOJI_DATA[emj]['E']} {emoji.EMOJI_DATA[emj]['en']} {hexcode(emj)} {emj}")

print("###########################")

def replace_fct(emj, emj_data):
    if emj in emojiToOpenmoji:
        alt = html.escape(emj)
        title = html.escape(emj_data['en'])
        src = html.escape(emojiToOpenmoji[emj]["hexcode"]) + ".svg"
        return f'<img src="{src}" alt="{alt}" title="{title}">'
    else:
        return "Unsupported emoji"

print(emoji.emojize("a lion in html: :lion:", version=-1, handle_version=replace_fct))

For some emoji it is necessary to remove the variant indicator U+FE0F or add it to find the emoji. With that modification it can match all emoji that are fully-qualified by Unicode except for the newest emojis. The script lists all emoji it cannot match and they are all part of Unicode 14.0/E14 wich Openmoji doesn't include yet (hfg-gmuend/openmoji#344) So generating it on runtime is definitely an option instead of hard-coding it.

cvzi avatar Oct 15 '21 11:10 cvzi