pyte icon indicating copy to clipboard operation
pyte copied to clipboard

Emojis/Grapheme clusters seem to be broken in pyte

Open chubin opened this issue 5 years ago • 2 comments
trafficstars

Consider this Python 3 code:

# -*- coding: utf-8 -*-

from __future__ import print_function, unicode_literals

import pyte

if __name__ == "__main__":
    emoji_string = "☁️"
    print(emoji_string.encode("utf-8").hex())
    print("---")

    screen = pyte.Screen(80, 24)
    stream = pyte.Stream(screen)
    stream.feed(emoji_string)
    for character in screen.display[0][:3]:
        print(character.encode("utf-8").hex())

emoji_string contains one grapheme cluster, that is displayed like in terminal/editor/etc:

Screenshot_2020-04-03_14-39-04

This emoji is displayed as a single one, but it conists of two and. Pyte seems to drop the second (the rest except the first part?) part of the cluster, and so the output of the program looks like this:

e29881efb88f
---
e29881
20
20

We see that efb88f was dropped, and immediately after e29881, spaces follow (20).

Is it a bug in pyte or is it expected behaviour? Maybe, I've missed some configuration mode?

chubin avatar Apr 03 '20 13:04 chubin

This is very likely a bug. Feel free to submit a PR ;)

superbobry avatar Apr 04 '20 11:04 superbobry

I have written a small workaround for this problem, it works fine for me, but I don't think that it is a good solution for this bug.

That is how I do it:

  def _fix_graphemes(text):
      """
      Extract long graphemes sequences that can't be handled
      by pyte correctly because of the bug pyte#131.
      Graphemes are omited and replaced with placeholders,
      and returned as a list.
  
      Return:
          text_without_graphemes, graphemes
      """
  
      output = ""
      graphemes = []
  
      for gra in grapheme.graphemes(text):
          if len(gra) > 1:
              character = "!"
              graphemes.append(gra)
          else:
              character = gra
          output += character
  
      return output, graphemes

I extract the graphemes before rendering, like this:

text, graphemes = _fix_graphemes(text)

and then after rendering I put them back.

It works like it should, but I am not sure that this method is (1) general enough (2) good for pyte, because it introduces a new dependency: grapheme

chubin avatar Apr 12 '20 12:04 chubin