lv_font_conv icon indicating copy to clipboard operation
lv_font_conv copied to clipboard

Converter misses opportunity to detect identical glyphs, stores them as separate images

Open pavmick opened this issue 1 year ago • 12 comments

As the title says. I am converting ASCII and Cyrillic ranges. The letter A, for example, is present in both ranges and it is being stored twice. Interestingly, the stored images are slightly different. Same for other identical glyphs. It should not be too troublesome to detect duplicate glyphs and store one copy only.

pavmick avatar Aug 24 '24 06:08 pavmick

How can we know if the ASCII A is the same as Cyrillc A? Check it on the rasterized image?

kisvegabor avatar Aug 30 '24 07:08 kisvegabor

I believe font files have facilities that allow different Unicode code points to reference the same glyph. For example, you can go to https://fontdrop.info/ , load arial.ttf, scroll down to unicode 0410 (Cyrillic letter A) click on it and observe "This composite glyph is a combination of: glyph 36". If you click on the letter A from ASCII range (close to top of table), you'll see same index 36.

pavmick avatar Aug 30 '24 10:08 pavmick

How many glyphs can be affected by that? I estimate it to max. 1% (but probably closer to 0.1%). What do you think?

kisvegabor avatar Sep 02 '24 16:09 kisvegabor

Let's see. For the Russian alphabet, I would say 11 uppercase and 8 lowercase letters share glyphs with ASCII. That would be 15% of ASCII range.

pavmick avatar Sep 02 '24 16:09 pavmick

Okay, it's really significant is this case.

So the task is to make the duplicated glyphs point to the same bitmap, right? If so, I'm okay with this feature. However I'm not a JS developer and can't work on the implementation.

Do you have time and interest to implement it?

cc @puzrin

kisvegabor avatar Sep 04 '24 10:09 kisvegabor

Guys, before discussing any changes, it's worth providing proof that the source font has multiple character codes mapped to the same image. If source images are different, that's the intent of the font authors, not a converter issue.

The TTF format has different tables for "images" and "char codes." AFAIK if an image has multiple references from char codes, the convertor should preserve them (but I'm not sure and don't remember details).

puzrin avatar Sep 04 '24 11:09 puzrin

Also worth refer binary format as base. The "lvgl" one is less optimal, focused on text representation of the source. Binary is a close subset of TTF, with minor local changes about raster/compression instead of vectors.

puzrin avatar Sep 04 '24 11:09 puzrin

So I looked closer at arial.ttf using fontdrop.info online tool. I can confirm that Russian letters АВЕМНОРТХаенорсух share glyphs with regular ASCII letters. That's 17 glyphs. This set could vary slightly from font to font, but I don't expect major variations. I am mostly an embedded C developer with some knowledge of JS. But I'll see if I can dive into the code and suggest patches.

pavmick avatar Sep 04 '24 12:09 pavmick

So I looked closer at arial.ttf using fontdrop.info online tool. I can confirm that Russian letters АВЕМНОРТХаенорсух share glyphs with regular ASCII letters. That's 17 glyphs.

And you used the same font in convertor, when found duplicated images? And the same problem in binary format?

puzrin avatar Sep 04 '24 12:09 puzrin

And you used the same font in convertor, when found duplicated images? And the same problem in binary format?

Just ran the converter on arial.ttf. Yes, the glyphs in question are duplicated. This time exact copies, to the last bit. I am not using the binary font format in my applications, so I can't confirm this behavior with it.

pavmick avatar Sep 04 '24 13:09 pavmick

There is a chance we ignored deduplication to save time. But that's 100% not internal [binary] format restriction (don't remember about lvgl).

puzrin avatar Sep 04 '24 13:09 puzrin

In LVGL we can also reference any bitmap_index for a glyph. See

 {.bitmap_index = 1307, .adv_w = 128, .box_w = 8, .box_h = 8, .ofs_x = 0, .ofs_y = -1},

kisvegabor avatar Sep 05 '24 09:09 kisvegabor