WLED icon indicating copy to clipboard operation
WLED copied to clipboard

Cyrilyc font 6x8, and special character support for scrolling text

Open bsveselov-rgb opened this issue 3 months ago • 19 comments

Good day, gentlemen! I'm trying to edit a 6x8 font file to add Cyrillic letters in scrolling text mode. I followed the instructions on the official website, but the project won't compile in VS Code, and I'm getting a ton of errors when selecting the board. I have zero programming experience. Could anyone help me with this? Thanks in advance.

bsveselov-rgb avatar Nov 18 '25 13:11 bsveselov-rgb

@bsveselov-rgb will your cyrilic letters replace the normal "roman" ASCII 32 thru 127 ?

currently WLED scrolling text does not support unicode, or any other extended character set. For example, not even the German "Umlaut" letters (äöüÄÖÜß) are supported. Same for other writing system like Greek, Arabic, Hebrew, Hindi, Korean, Chinese, Japanese, Cyrilic, Emoji, etc.

This was a design decision, to cope with limited Flash memory.

softhack007 avatar Nov 18 '25 14:11 softhack007

Yes. I need a version with only the Cyrillic font. After consulting DeepSeek and reading, I found a font with 6x8 Cyrillic characters. Is it possible to compile the firmware with just one font version, for example?

bsveselov-rgb avatar Nov 18 '25 14:11 bsveselov-rgb

you would need to replace the content of wled00/src/font/console_font_6x8.h

https://github.com/wled/WLED/blob/aaad450175c38dd228146fd81940e707023c8059/wled00/src/font/console_font_6x8.h#L788-L798

by exchanging all "roman" letters like 'A' with a Cyrillic letter (bitmap encoded in bytes), one by one.

I don't know how a cycilic keyboard works - does it produce ASCII codes below 126? It might still not lead to the expected result when you use the keyboard to type messages - maybe you'll have to "encode" each cyrilic character of the text manually, letter-by-letter.

softhack007 avatar Nov 18 '25 14:11 softhack007

Yes, I understood that and found the necessary code for the characters. Here's an example:

// А (0xC0)
    0x30, /* 001100 */
    0x48, /* 010010 */
    0x48, /* 010010 */
    0x78, /* 011110 */
    0x48, /* 010010 */
    0x48, /* 010010 */
    0x00, /* 000000 */
    0x00, /* 000000 */
    
    // Б (0xC1)
    0x78, /* 011110 */
    0x40, /* 010000 */
    0x70, /* 011100 */
    0x48, /* 010010 */
    0x48, /* 010010 */
    0x70, /* 011100 */
    0x00, /* 000000 */
    0x00, /* 000000 */
    
    // В (0xC2)
    0x70, /* 011100 */
    0x48, /* 010010 */
    0x70, /* 011100 */
    0x48, /* 010010 */
    0x48, /* 010010 */
    0x70, /* 011100 */
    0x00, /* 000000 */
    0x00, /* 000000 */
    
    // Г (0xC3)
    0x78, /* 011110 */
    0x40, /* 010000 */
    0x40, /* 010000 */
    0x40, /* 010000 */
    0x40, /* 010000 */
    0x40, /* 010000 */
    0x00, /* 000000 */
    0x00, /* 000000 */
    
    // Д (0xC4)
    0x30, /* 001100 */
    0x50, /* 010100 */
    0x50, /* 010100 */
    0x50, /* 010100 */
    0x50, /* 010100 */
    0x78, /* 011110 */
    0x88, /* 100010 */
    0x00, /* 000000 */
    
    // Е (0xC5)
    0x78, /* 011110 */
    0x40, /* 010000 */
    0x70, /* 011100 */
    0x40, /* 010000 */
    0x40, /* 010000 */
    0x78, /* 011110 */
    0x00, /* 000000 */
    0x00, /* 000000 */

The problem is, I can't compile the bin file for the firmware. Can I ask you for help by posting or sending the code for the Cyrillic characters?

bsveselov-rgb avatar Nov 18 '25 14:11 bsveselov-rgb

Can I ask you for help by posting or sending the code for the Cyrillic characters?

sorry, not myself - I'm too busy with other stuff atm :-(

But you can ask on discord, where we have a very active user community and lots of people who are ready to help others with their projects

Join the Discord server to discuss everything about WLED!

softhack007 avatar Nov 18 '25 15:11 softhack007

Yes, I understood that and found the necessary code for the characters. Here's an example:

// А (0xC0)

PS: maybe you still did not do it right 🤔 as I said, you cannot "add" the cyrilic letters at their ASCII position (here: 0xC0) because WLED scrolling text only accepts letters up to ASCII 126 = 0x7E . It means that you must replace roman letters, for example exchange the bytes for "B" (0x42) with the bytes to draw "Б".

softhack007 avatar Nov 18 '25 15:11 softhack007

Just for the curious - i've made a rough overview of what would be needed to support the full CP437, and later even add fonts for other codepages

  • https://github.com/MoonModules/WLED-MM/issues/281

There are of course a lot of "known unknowns" - for example I'm not sure if the esp32 framework already provides something to convert UTF-8 strings to CP437 strings. Finding Font files for other codepages might be a big challenge, and it is not clear if extra font files would ever be integrated into the main WLED codebase. Full codepage support would always be optional (custom build_flags), because localized fonts will increase firmware size considerably.

Btw, 8266 will probably never have this feature, due to 8266 firmware size limits. But I could image a solution for esp32 boards with at least 8MB flash.

softhack007 avatar Nov 18 '25 17:11 softhack007

@softhack007 I've already recommended that someone implements loading font files (binary or JSON or whatever) from file system. That would allow anyone to upload custom font without the need for custom compiling. However, that feature might be only feasible on ESP32 (and variants) as ESP8266 lacks free RAM for font data.

I do not think implementing full Unicode support is feasible but a font file might allow this (at the cost of accessing flash during effect drawing).

blazoncek avatar Nov 18 '25 17:11 blazoncek

I've already recommended that someone implements loading font files (binary or JSON or whatever) from file system. That would allow anyone to upload custom font without the need for custom compiling.

I just had the same thought and ran a quick test to see if it can be read directly from the file, well it can't. Opening a file and reading a few bytes then closing takes 25ms... BUT if the file is kept open, the access is very fast, 100us or so. littleFS allows for multiple files to be open, I checket up to 4 without issues, if this also works on ESP8266 then there is no need to load it to RAM.

edit: accessing 4 files in parallel also works on ESP8266, it is a bit slower though, reading 10 bytes per pass takes 270us, still more than fast enough so this appraoch would be feasible.

DedeHai avatar Nov 18 '25 18:11 DedeHai

@DedeHai so for the full picture, we'd need a string conversion from UTF-8 (or unicode?) to codepage xyz encoding, to have an index for the font tables ... any ideas?

edit: for boards with PSRAM, we could even load the font files into PSRAM for caching.

softhack007 avatar Nov 18 '25 18:11 softhack007

I already have it sketched out in my head, something like this:

  • pack the chars as bits, padded to full bytes for easier access, so a 7x9 char would be 8 bytes (63bits plus one padding bit) instead of the currently use 9 bytes.
  • to keep it simple: the first three bytes in the font file describe the font format, first byte is the bits-per-char, second is the start offset, third is the end of the table.

need of course some pythong script to generate these binary blobs from the current format, or if there are other commonly available pixel font formats, also support converting those. ASCII tables are always 256 bytes in size, 128-255 are extended chars so here is for greek for example: https://www.ascii-codes.com/cp869.html

so a "greek only font" without fancy stuff would start somewhere around char 164 and end at char 238 but nothing sais it cannot be a full 0-255 ASCII table supporting all chars, including numbers.

the naming could be to replace the current 5 fonts as overrides, font1.fnt to font5.fnt. it would in theory also allow for fonts with much larger chars for HUB75, the limit would onyl be the access time to read the chars but it is quite fast, less than 10us per byte.

edit: actually the limit would be 256bits per char unless we make that header 4 bytes, which would be reasonable I guess.

edit2: now that I think about it, it really makes little sense to not always start a font at char 0, since it's stored in FS we can spare a few hundred extra bytes to keep the fonts simple and more universal for users. If a font does not contain the extended chars > 127, the file read at that position will just fail, so should be easy to catch.

edit3: forgot we need width and height of a char, so header would be 2 bytes for width and height. To support all languages, we'd need UTF-8 suport which I dont think is feasible with reasonable effort.

DedeHai avatar Nov 18 '25 19:11 DedeHai

@DedeHai so we would still need a way to convert unicode (UTF-8) - which is used by the WebUI when you name your segment in greek, or even you name it "schöne Grüße aus Grønland 👋 ", into the "codepage" that's used by your font file. Of course some unicode chars will not be supported by the font. However some basic unicode parser / translator would be necessary to pick the right character to display.

softhack007 avatar Nov 18 '25 20:11 softhack007

To support all languages, we'd need UTF-8 suport which I dont think is feasible with reasonable effort.

A possible solution for the "codepage mapping" might be to translate UTF-8 into UTF-16 (aka UCS-2), which covers almost any letter supported by UTF-8 but always needs two bytes per letter - if we forget about the 4-byte variable codes for exotic stuff. Codes < 127 are always ASCII codes, so no mapping needed for roman letters, numbers etc.

  1. Font file: for all "characters" > 126, add two bytes which indicate the UTF-16 code for the letter or symbol (might be replaced with a generic UTF16-to-codepage_xyz translation list for all user fonts)

  2. WLED translates the segment name from UTF-8 into UTF-16 (straight forward, no LUT involved)

  3. if UTF16_index < 127 --> display font[UTF16_index]

  4. if UTF16_index >= 127 -> scan the font until we find font[x].utf16 == UTF16_index --> display if found, show " " otherwise

The whole process could be accelerated if we can build a translation table in RAM.

Some magic python script could generate the font file, and generate UTF-16 indices when needed.

softhack007 avatar Nov 18 '25 20:11 softhack007

Slightly related: this is how unicode (aka utf-8) works: https://youtu.be/gd5uJ7Nlvvo (skip to time index 45:25 if your impatient)

softhack007 avatar Nov 19 '25 01:11 softhack007

Loading font data from files would be a great effort. It could also enable flexible font widths, different font styles etc. With these easily editable files it would also be possible to replace certain characters with custom ones. Some time a ago I wrote some lines and referenced the MD_Parola library also using font files: https://github.com/wled/WLED/issues/3071 is also mentioning some other ideas for scrolling text. The UTF8 conversion can be done like this: https://forum.arduino.cc.....

ElToberino avatar Nov 19 '25 08:11 ElToberino

  1. Font file: for all "characters" > 126, add two bytes which indicate the UTF-16 code for the letter or symbol (might be replaced with a generic UTF16-to-codepage_xyz translation list for all user fonts)

will this work both ways? the text is stored in preset as segment name.

DedeHai avatar Nov 19 '25 09:11 DedeHai

will this work both ways? the text is stored in preset as segment name.

I think it will work. Most likely we don't even recognise that presets and segment names are already UTF-8, because when you enter a string in the webUI, it will arrive in WLED as unicode (UTF-8), and the same string (even when handled as char *) sent back to the UI will be decoded as UTF-8. That's why people can already use greek letters or emoji as segment names.

Actually for Font rendering in scrolling text, only the "forward" direction will be needed because we can do it without modifying seg.name.

  • new function: DrawString receives the segment name as input
  • -> copy seg.name to temporary string for conversion
  • -> convert to UTF-16, but skip "extended codes" which require more than 2 bytes (extended = emoji and other funstuff)
  • -> new function byte UTF16_to_FontChar(uint16_t wchar, font&) --> convert to font -specific byte_index
  • -> draw font[byte_index]

for the first shot, we can have a static conversion (switch-case) in UTF16_to_FontChar that maps UTF-16 to a codepage (like CP437) used by the font. The wikipedia CP437 page has all the unicode codes needed for this CP437 translation, so I can grab them and directly generate case codepoint_abcd: return index_xyz; break; with a simple shell script.

In general there seems to be some C++ standard template class that can do the conversion, but I did not look deeper into that.


Edit: just did a quick check (WLED-MM but should be the same in upstream): yes unicode can be entered in in segment name (WebUI), its properly saved in presets.json, and properly displayed when you load a preset. conclusion: unicode already works in WLED, we just did not notice it ;-)

Image Image

softhack007 avatar Nov 19 '25 10:11 softhack007

The rabbit has created a nice flowchart for the "simplified" UTF8 - to DrawCharacter flow --> working on it, so we can build on something and extend the idea with custom font loading, custom codepages, etc.

sequenceDiagram
    autonumber
    participant App as Application (drawCharacter)
    participant UTF8 as UTF-8 input
    participant U16 as unicodeToWchar16()
    participant CP437 as wchar16ToCodepage437()
    participant Font as console_font_4x6

    Note over App,Font: New optional full-font flow (WLED_ENABLE_FULL_FONTS)
    App->>UTF8: provide UTF-8 string
    UTF8->>U16: unicodeToWchar16(utf8)
    U16-->>CP437: returns uint16_t codepoint
    CP437->>CP437: map codepoint -> CP437 byte (switch table)
    CP437-->>App: CP437 byte
    App->>Font: index glyph using CP437 byte
    Font-->>App: glyph bitmap

only one thing missing: instead of DrawCharacter, the scrolling text has to call a new method DrawText, and pass the complete segment name string for UTF-8 decoding.

softhack007 avatar Nov 19 '25 17:11 softhack007

It could also enable flexible font widths

@ElToberino thanks for the reassurance :-)

You might want to "hold your horses" about flexible width glyphs, though. We are still on "fixed with", however it would be possible in future to calculate the real width of each glyph, and advance the next drawing position based on this. Actually that's a "third topic" - in addition to font file loading, and (limited) unicode support.

softhack007 avatar Nov 20 '25 00:11 softhack007