ecoji icon indicating copy to clipboard operation
ecoji copied to clipboard

Address ambiguous emoji

Open dcow opened this issue 3 years ago β€’ 8 comments

Stemming off of #27, I have a few thoughts that probably warrant their own issue:

It would be nice if the next version had a design goal relating to the ease of description and clarity of various emoji. In other words, can the encoded text be described in a verbally succinct and unambiguous manner? As an added benefit, this would allow the (verbal and/or text) transmission of ecoji encoded data to benefit from contextual human error correction (similar reasoning to the selection of the 1024 words for bip-39 mnemonic representation of data).

Here are some examples of arguably ambiguous pairs of emoji:

    emojis[64] = "πŸŒ†"
    emojis[65] = "πŸŒ‡"

I don't know if those first two are city skylines at sunrise and sunset or simply two different skylines at sunset or sunrise. If it's sunrise/sunset, I have no idea which is which.

    emojis[500] = "πŸ“…"
    emojis[501] = "πŸ“†"

Pretty difficult to pick out the page turn on 501.

    emojis[496] = "πŸ“"
    emojis[497] = "πŸ“‚"

"slightly open grey folder" vs "slightly more open grey folder"?

    emojis[80] = "πŸŒ–"
    emojis[76] = "πŸŒ’"

When you search for wan and wax in the iOS keyboard, both those emoji are listed as results for both queries.

    emojis[232] = "🎡"
    emojis[233] = "🎢"

2 eighth notes vs 3 eighth notes but to most people probably just "music notes"

    emojis[216] = "πŸŽ₯"
    emojis[217] = "🎦"

"left facing camera" vs "right facing camera"

    emojis[95] = "🌧"
    emojis[96] = "🌨"

I think that last one is snow or ice but mostly just looks like a cloud with more raindrops.

    emojis[565] = "πŸ”ˆ"
    emojis[566] = "πŸ”‰"

I don't even know. I think there might be a faint sound "ring" being emitted from the second one.

My working proposal would be to focus on emoji from the "food", "animals & nature", "objects", "activities" and "travel & places" groups. Then, avoid emoji which have multiple variants or ones that are essentially ligatures.

I don't know why base 1024 was chosen (perhaps also for parity with bip-39), but it may even be apropos to have a discussion on the base size after a groomed list of emoji are selected (so as not to feel pressure to fill a fixed number but rather use the unambiguous list size to more naturally determine the radix).

dcow avatar Mar 02 '21 02:03 dcow

I agree the cityscape, calendar, and camera are all basically the same. I'm not so convinced on the moon since I might at first not know the difference but I think people who dont get it would learn it pretty fast, but I get what you're saying.

I also agree it would be valuable to have a clear objective then we wouldn't all be taking shots in the dark on what best way to improve ecoji. Personally my ideal use-case would be to use ecoji instead of hex for my hashcodes and ids, so I can, for instance, expose a user's unique userId to them in a fun way, or generate really short but still secure one time codes for logging into an app. The fact that this gives me the ability to generate very large numbers and represent them in a very small, easily readable way is killer and I πŸ’―% agree with striving for a set of really distinguishable characters would bring a lot of value to ecoji.

For the radix I guess theres really no reason it couldn't be any other number πŸ€” , although in my limited experience writing these kinds of encoders I had assumed that binary numbers were better, is there any explicit advantage at all? I should probably do some research πŸ˜….

robindiddams avatar Mar 15 '21 01:03 robindiddams

I agree the cityscape, calendar, and camera are all basically the same. I'm not so convinced on the moon since I might at first not know the difference but I think people who dont get it would learn it pretty fast, but I get what you're saying.

These are just examples to spark discussion (:

Personally my ideal use-case would be to use ecoji instead of hex for my hashcodes and ids, so I can, for instance, expose a user's unique userId to them in a fun way, or generate really short but still secure one time codes for logging into an app. The fact that this gives me the ability to generate very large numbers and represent them in a very small, easily readable way is killer and I πŸ’―% agree with striving for a set of really distinguishable characters would bring a lot of value to ecoji.

This is my use case as well. Agree completely!

For the radix I guess theres really no reason it couldn't be any other number πŸ€” , although in my limited experience writing these kinds of encoders I had assumed that binary numbers were better, is there any explicit advantage at all? I should probably do some research πŸ˜….

Right now ecoji is base 1024 (binary is base 2). The bit about the radix is simply to acknowledge that it may be difficult to come to a groomed list of 1024 emoji but it might not be so difficult to land on 128, 256, or 512. Choosing the base is a balancing exercise between encoding density and alphabet size. While a larger base will yield a more dense encoding, it may be less usable in a fashion that suites the design goal you and I have of being human friendly.

dcow avatar Mar 15 '21 02:03 dcow

right, I'm more connected to fun colorful encoding than compression but both is certainly ideal. Theres a little over 1200 single codepoint emojis, so 1024 seems like the ideal to me, although in my current review of the set there are definitely plenty that are very similar like you discovered.

robindiddams avatar Mar 16 '21 02:03 robindiddams

1024 is also just a really big alphabet to expect people to use. 1024 may be technically possible but might, in practice, be difficult for someone trying to enter these emoji by hand or communicate them verbally. I don't have good empirical data to support this hypothesis yet, though. Probably makes sense to shoot for 1024 at the moment and then discuss reducing the base size further after we have more experience trying to sort out ambiguous emoji and/or watching people try to use the alphabet.

dcow avatar Mar 16 '21 02:03 dcow

Given the stated goals of

expose a user's unique userId to them in a fun way, or generate really short but still secure one time codes for logging into an app

I'm curious if there's a need to consider replacing emoji that could be controversial in some way. This could be a slippery slope, but I can imagine alcohol, weapons, smoking and drugs as the most likely to cause consternation. Imagine that an alcoholic had 🍻 in their userId, for example. πŸ₯‚πŸΊπŸ»πŸ₯ƒπŸΈπŸ·πŸΆπŸΉπŸ”«πŸ”ͺπŸ—‘οΈβš”οΈπŸ’€πŸš¬πŸš­πŸ’ŠπŸ’‰

This doesn't even begin to include others like πŸ–•πŸ©πŸ‘™ that might be generally considered unprofessional.

An argument could also be made to filter out religious emoji like: β›ͺπŸ•πŸ›•πŸ•ŒπŸ™πŸŽ…πŸ€ΆπŸŽ„πŸ•Žβœ‘οΈπŸ•‹ πŸ•‰οΈπŸ•ŠοΈπŸ›βœοΈβ˜¦οΈβ˜ͺ️☸️ (not all of these are in the current list, but including them here anyways)

Or emoji with a common slang meaning, including: πŸ†πŸ‘πŸ’¦

And then of course you could go one step further (slippery slope, like I said) and remove some of the emoji that could offend some folks, like: πŸ‘¬πŸ‘­ (and for the sake of avoiding heteronormative bias, might as well replace πŸ§‘β€πŸ€β€πŸ§‘ too) πŸ‘³β€β™‚οΈ and πŸ§• could potentially fit in this category or the religious one above as well

And a final group would be emoji that might be too regional in nature, like: πŸ—ΎπŸŽŒπŸ—½πŸ—Ό

I'm definitely not trying to state that all of these (and I'm sure there are more that I missed) should be replaced, but I'd like to see where the conversation takes us!

tremblay avatar Mar 16 '21 21:03 tremblay

@tremblay I was really trying not to think about controversy but I'm glad you brought it up

Imagine that an alcoholic had 🍻 in their userId, for example.

is a really good point, (even though 🍻 is like my very favorite emoji!) and in the end personally I would think its fun if my user id had πŸ–•or πŸ”« or πŸ”ͺ in it, but not at the expense of distressing someone else or causing devs to pass on ecoji because they're afraid some of the emojis might offend, oppress or otherwise distress viewers. Fortunately I think the unicode consortium does a good job of mitigating that and I think we would be able to draw plenty from the newer emoji versions.

robindiddams avatar Mar 31 '21 01:03 robindiddams

I'm trying to finalize the set of Emojis in Ecoji V2. The following have already been removed in Ecoji V2.

emojis[217] = "🎦"    
emojis[95] = "🌧"
emojis[96] = "🌨"

Out of the ambiguous pairs identified, the ones I would like to resolve the most are the following.

emojis[64] = "πŸŒ†"
emojis[65] = "πŸŒ‡"
    
emojis[500] = "πŸ“…"
emojis[501] = "πŸ“†"
    
emojis[496] = "πŸ“"
emojis[497] = "πŸ“‚"

Looking over the current list of candidates for Ecoji V2 I only see two unused Emojis that I like which are U+23F0 ⏰ and U+23F3 ⏳. To resolve all three of the above, three replacement emojis would be needed. We could look at the replacements that have been done so far and see if there anything we want to reconsider to get another emoji. I have not found any replacements that I would reconsider so far.

So with the two emoijs and three choices, I think the calendar is the worst and want to replace it. Having a harder time deciding between the folder and citiscape for the second, but leaning twoards the folder. So think of making the following replacements.

  • replace πŸ“† with ⏰
  • replace πŸ“‚ with ⏳

If anyone has any other suggestions let me know.

keith-turner avatar Feb 18 '22 17:02 keith-turner

This is done as much as it can be in the ecoji v2 branch. Can close this when that branch is merged.

keith-turner avatar Feb 20 '22 21:02 keith-turner