stacker.news icon indicating copy to clipboard operation
stacker.news copied to clipboard

Treat emoji as single character when calculating length for a title

Open cointastical opened this issue 3 years ago • 2 comments

SN has implemented an 80 character limit for the Title of a post. If I use an emoji in a title, that counts as two characters for the length.

For example, this title would trigger the hint message "1 too many":

1234567890123456789012345678901234567890123456789012345678901234567890123456789📺

An emoji renders wider than an ASCII character, yes, so maybe that is the reasoning for an emoji counting as two characters. But so do other unicode characters that render wider than an ASCII character (e.g., this alternative comma+space alternative thingy "、"), and those are counted only as a single character (i.e., not penalized).

So I don't know if the emoji counted as two characters was intentional or not, but my request is to count it as a single character. Or if they count as two to prevent titles with many emojis rendering wider, maybe if there is only a single emoji, then that one emoji gets treated as a single character, but when the title has multiple emojis then the current behavior (treating as two characters each) is warranted.

It might seem much ado about nothing (the difference of one single character), but when posting a podcast or video where the content's original title is too long for SN, then every character counts. So fitting in one extra character will sometimes makes all the difference. Thus I ask that this request be considered.

cointastical avatar Oct 13 '22 11:10 cointastical

So I don't know if the emoji counted as two characters was intentional or not, but my request is to count it as a single character.

I don't think this was intentional. It's just how Javascript determines length of strings since it uses UTF-16 encoding. See https://stackoverflow.com/questions/38345372/why-is-length-2

Out of interest, I looked up the raw bytes of an emoji vs normal characters using UTF-8 encoding using this website:

  1. "test" -> 74 65 73 74
  2. "tes:tv:" -> 74 65 73 F0 9F 93 BA

As you can see, the emoji does actually take 4 times the size of a "normal" ASCII character (see this table for ASCII characters) when considering bytes in UTF-8 encoding (UTF-8 is backwards compatible to ASCII so ASCII characters result in the same bytes using ASCII or UTF-8).

Using UTF-16, this happens:

  1. "test" -> 74 00 65 00 73 00 74 00
  2. "tes:tv:" -> 74 00 65 00 73 00 F0 9F 93 BA

Now the emoji does take two times the size of a "normal" ASCII character, hence it uses up two characters in UTF-16 encoding.

ekzyis avatar Oct 15 '22 22:10 ekzyis

I think this can be easily fixed using Array.from as described here: https://stackoverflow.com/questions/38345372/why-is-length-2/46085089#46085089

ekzyis avatar Oct 15 '22 22:10 ekzyis

I think having a small penalty for emojis is fine. They are attention expensive at more than twice the rate of a normal character. Going to close

huumn avatar Jan 11 '23 18:01 huumn