feature-requests font renderer: support the notion of "glyphsets"

Describe the problem you have/What new integration you would like

I'm Greek. I'd like to be able to show text, using the display/font component, in the Greek alphabet, in addition to the Latin alphabet.

Please describe your use case for this integration and alternatives you've tried:

For this purpose, there is the glyph attribute in the component's configuration, where I can add every character. However, that is not trivial to define once you get started. You need to enumerate all characters, as well as all the permutations of characters + accents (as no canonicalization form is applied). I've done the effort of course, but it's a shame for every Greek speaker to have to go through the same trouble.

It'd be great if there was a way to configure the component with some notion of "latin", "greek", and so on and so forth.

A few alternatives I've considered and seeking input for:

Use the notion of Unicode blocks. In my case, it would be "Greek and Coptic", U+0370 - U+03FF. stdlib's unicodedata does not include this functionality, but we could use a library like https://github.com/nagisa/unicodeblocks (unfortunately unmaintained). The other downside is that Unicode's blocks are fairly wide, e.g. in this case it includes a bunch of Coptic characters, which won't useful to me (or anyone else, really).
Statically define a few alphabets as map to to character ranges, in the ESPHome code, seeded from an original small list and then updated as needed, from users across the globe.
Use the Google Fonts Glyphsets Python library (pip install glyphsets), which includes glyph ranges (not blocks!) as defined by the Google Font team, such as GF_Latin_Core, and GF_Greek_Core. These are way more accurate, but the library a) brings a bunch of dependencies b) has a few rough edges, as it was really designed for font authors etc.

I actually have code that implements option (3), but I'm still wondering if it's the way to go and was hoping to agree on the way forward before opening up a PR :) For the dependencies, I've actually used the same trick as we do for pillow, where it's dynamically imported and if not found, prints an error - not sure if that's desired here?

The diffstat for the record is:

 esphome/components/font/__init__.py | 44 ++++++++++++++++++++++++++++++++++++++++++--
 esphome/const.py                    |  1 +

Additional context

List of Unicode blocks, for option 1: https://en.wikipedia.org/wiki/Unicode_block

List of Google Fonts glyphsets, for option 3:

GF_Arabic_Core
GF_Arabic_Plus
GF_Cyrillic_Core
GF_Cyrillic_Historical
GF_Cyrillic_Plus
GF_Cyrillic_Pro
GF_Greek_AncientMusicalSymbols
GF_Greek_Archaic
GF_Greek_Coptic
GF_Greek_Core
GF_Greek_Expert
GF_Greek_Plus
GF_Greek_Pro
GF_Latin_African
GF_Latin_Beyond
GF_Latin_Core
GF_Latin_Kernel
GF_Latin_Plus
GF_Latin_PriAfrican
GF_Latin_Vietnamese
GF_Phonetics_APA
GF_Phonetics_DisorderedSpeech
GF_Phonetics_IPAHistorical
GF_Phonetics_IPAStandard
GF_Phonetics_SinoExt
GF_TransLatin_Arabic
GF_TransLatin_Pinyin (per glyphsets.defined_glyphsets())

A few examples for glyphsets:

print("".join([chr(x) for x in glyphsets.unicodes_per_glyphset("GF_Latin_Kernel")]))
print("".join([chr(x) for x in glyphsets.unicodes_per_glyphset("GF_Latin_Core")]))
print("".join([ chr(x) for x in sorted(set(glyphsets.unicodes_per_glyphset("GF_Latin_Kernel")) | set(glyphsets ...: .unicodes_per_glyphset("GF_Greek_Core"))) ]))

Jun 29 '24 09:06 paravoid

Cc @clydebarrow as code owner. Thanks in advance :)

Jun 29 '24 09:06 paravoid

Basic idea sounds good (I don't use any languages with non-Latin alphabets, so I'm not qualified to assess the need for this feature.)

With regard to dependencies, I find the need to manually install pillow already a pain-point, so I wonder if there is any good reason why it and the glyphsets library could not be included in requirements.txt by default?

Jun 29 '24 13:06 clydebarrow