enable Provide better support for mixed writing systems

trafficstars

This is mainly a problem of the AGG backends. Quartz and QPainter backends already have the correct behavior.

Basically, user code should be able to call show_text on the following string: "Kiva Graphics一番😎" and have it render correctly even if the currently selected font only supports Latin characters.

To get to this point, we need to do a few things:

[x] Collect writing system information for entries in our font database
[x] Build fallback lists for font families and styles
[x] Find a library to use, or failing that, write our own function which breaks a string up into chunks which share the same writing system (see: https://stackoverflow.com/questions/9868792/find-out-the-unicode-script-of-a-character)
[x] Make low level text drawing functions return the text cursor position after drawing a run of glyphs (or work around the absence by calling get_text_extent on every chunk of a string before drawing)
[ ] Bring everything together in the show_text method so that mixed strings can be drawn
[ ] Bonus: Support bidirectional text mixing

This is roughly what Qt does, based on a quick skim of the code: https://code.qt.io/cgit/qt/qtbase.git/

QFreeTypeFontDatabase::addTTFile (qtbase.git/tree/src/gui/text/freetype/qfreetypefontdatabase.cpp): Scans a font for the following information: weight, style, fixed-width, supported writing systems (unicode range, codepage range), family name
QPlatformFontDatabase::fallbacksForFamily (qtbase.git/tree/src/gui/text/qfontdatabase.cpp): Takes a style and script ID and returns a list of fonts which support that script with that style (or just support the script)
QPainter::drawText (qtbase.git/tree/src/gui/painting/qpainter.cpp): Basically Qt's show_text. Uses QStackTextEngine for shaping, breaking of input string. Breaks into QScriptItem objects. Picks the font per item and draws it.
QStackTextEngine/QScriptItem/QTextItemInt (qtbase.git/tree/src/gui/text/qtextengine.cpp) These are the components which break up a string into chunks which can be shaped and drawn as a unit.

Mar 26 '21 16:03 jwiggins

Lovely: https://raphlinus.github.io/rust/skribo/text/2019/04/04/font-fallback.html

Mar 30 '21 10:03 jwiggins

Copying from #767 so it's easier to find:

Having played with [mapping of "Han" to a CJK language] a bit more, we should only use [the locale-based guess] when it's not otherwise clear from the context. For instance if a string already contains Hiragana or Katakana, then Han should be mapped to "Japanese". If Hangul is encountered, Han maps to "Korean". Only if the Han is mixed with some non-CJK language should we fall back to this locale-based guess.

Apr 06 '21 10:04 jwiggins

Consider libgrapheme or utf8proc for classifying graphemes in a string.

Dec 22 '21 19:12 jwiggins

enable enable copied to clipboard

Provide better support for mixed writing systems

enable
enable copied to clipboard