xterm.js
xterm.js copied to clipboard
Support font ligatures
Support being removed in https://github.com/sourcelair/xterm.js/pull/938
It's certainly still possible, the renderer needs to know which characters to join though.
I think we are not interested into the regular ligatures (in fact, it would be bad to enable ligatures for li
and such), but for things like ==
, !==
, =>
and so on. Here's a nice list of ligatures supported by the fira code monospace font:
https://github.com/tonsky/FiraCode/blob/master/showcases/all_ligatures.png
What would happen if half of the ligature is a different color? How would you handle that?
@LoganDark does that even work with regular ligatures? Or are they split up when they're different colors?
No, and no.
i wonder if it would be possible to pull render code from txtjs, as they have figured out how to render ligatures, although i think they manually draw the text. http://txtjs.com/examples/Text/ligatures.html
@devsnek I don't think it's an issue actually doing the rendering of the text. The issue is knowing which character join so they can be drawn together (Currently "==" is drawn as "=" and "=", not "==").
@Tyriar wouldn't the font renderer take care of that without our intervention
@devsnek yes, but each cell in the grid is drawn individually with a few exceptions (emojis, wide unicode chars). Ligatures need to somehow be included in that list. Look at https://github.com/sourcelair/xterm.js/pull/938 for more context
@devsnek IT'S YOU
disclaimer: completely off topic, just a random comment
@LoganDark does that even work with regular ligatures? Or are they split up when they're different colors?
If we're going by how other emulators handle them, they do get split up if they are different colors, as one would expect. This is almost a requirement since this allows symbols to be correctly represented by some language highlighters in ViM, for example.
I think it's entirely acceptable to show the individual symbols if the render mode is not the same for all of the underlying characters.
@Qix- My suggestion then would be to draw all the text at once and then do coloring in post. That would eliminate any issues with ligatures, and wouldn't require detecting ligature pairs (although it would break compatibility with variable-width fonts, or even monospace fonts that are slightly off/don't have integer widths)
@LoganDark multi-colored ligatures would look bizarre and there would be no clear way to color them IMO.
Yeah I don't think multi-color ligatures would work. It also goes against how they work underneath, where a single glyph is drawn at the start, not multiple.
As a clarification, this is waiting on a good solution for detecting which character sequences have ligatures. To do this properly it would probably involve low level code that checks font files (and thus would need to be a native node module and not work for web consumers), I don't think this information is exposed to the web platform.
Now that hyper released 2.0.0 stable, maybe ligature workarounds need a higher priority.
ref https://github.com/zeit/hyper/issues/914#issuecomment-361034148
Determining the glyph mappings manually is a tough nut to crack. From what I can tell, making a decent experience out of this would require the following:
- Map the selected font family back to the file/buffer containing the actual font data (otf/ttf/woff/etc)
- Parse the data from the GSUB table of the font and translate that into a sensible set of rules for glyph replacement
- Pass some sort of map or mapping function to xterm to determine what to render for a given character sequence
I've done some initial poking around with Fira Code (specifically its nerd font variant) to try to figure out how difficult each step might be. I haven't yet decided whether I'm feeling ambitious enough (or care about font ligatures enough) to take this on, but here's what I've found so the knowledge isn't lost:
-
There is no way that I can find to fetch the font data using browser APIs, so this won't work as a direct feature of xterm.js, but more likely as a separate package/extension with a hook exposed by xterm.js
-
Mapping the CSS
font-family
name of a font back to its font file in Windows is painful but appears doable. So far the only way I've found is to fetch everything in%WINDIR%\Fonts
and parse every file I find (spoiler: it's really slow). Haven't tried other platforms yet. (Note: I also tried lifting the name from the registry but the naming doesn't line up for some fonts such as the ones from nerd fonts. They use a "preferred" family and subfamily which isn't picked up in the registry's name but is used in the cssfont-family
. If you're curious, the registry key is inHKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Fonts
) -
There's a library called opentype.js which does full parsing of OpenType font tables and even has a
font.stringToGlyphs(str)
function which handles basic ligatures, but the ligatures for Fira Code (and several if not all of the other common ligature fonts) use a feature called contextual alternates which is not yet supported by opentype.js (listed under Planned). The necessary GSUB table is parsed to JSON for you though, so theoretically all that's missing is the interpretation of the table data. -
Fira Code ligatures (and I think others) actually replace the original glyphs with equal numbers of glyphs, rather than a single extra wide one (to maintain the properties of the monospace font). For a string like
==>
, the font will end up telling you to replace it with two "LIG" characters (essentially an empty space), followed by the actual glyph for the ligature, which is still technically only one monospace character wide. Despite being ostensibly single width, the path of the last character extends beyond the left side of the character's bounding box to cover the space occupied by the two LIG characters before it. See below for a visual (0 and 600 are the sides of the character's box). I don't know if this complicates the task of the renderer or if it would have to be transformed before being passed to xterm.js, but something to be aware of.
-
Another wrinkle in this is determining when to re-evaluate the ligature. For example, if I type
=
four times consecutively, the desired behavior would be for me to see a single equals, then a double equals ligature, then a triple equals ligature, then four separate equals signs. There are actually mappings in the contextual alternates rules for clearing out the ligature (i.e. if the current input is '=' and the previous three characters are '===', don't remap it at all), but we'd have to figure out how to apply those rules. -
OpenType is complicated. Admittedly, I'm not a font rendering expert, but the number of possible variations that lead to different types of rendering is pretty extensive. Without a built-in library that does the mapping for us, I think the most reasonable way to attack this is incrementally. Fira Code specifically uses Chaining Context Substitution Format 3, but I'm sure there are other popular fonts that use different ones. Since each has slightly different semantics, it probably makes sense to start with one and go from there.
@princjef Thanks for sharing your explorations, really really helpful! I have also been putting some thoughts into this topic a couple days ago, and I came to the following conclusion:
- Detecting ligatures in webfonts using a build-in API seems impossible (just as you describe)
- There are certain ligatures that don't make sense in a terminal, even if the font supports it (e.g.
li
) - There is a very small subset of ligatures that make sense to be supported, namely most of the Fira Code ligatures.
- Adding support for drawing and clearing a character that spans multiple cells is very hard to implement with our current single-character based rendering approach (CharAtlas).
TBH, I don't think it's worth the effort to support ligatures at the current state π
@mofux I actually think I've found a (somewhat palatable) way of getting the ligatures to render in xterm.js by approaching it from a different angle. I somehow missed that canvas will automatically render the ligatures for you in my initial investigations. Supporting ligatures becomes a matter of making sure that the relevant characters are rendered together.
To that end, I tweaked the text renderer to render characters with identical attributes (fg, bg, etc.) as a single group. This won't quite render all ligatures correctly (and might render some that shouldn't be), but it should render a ligature anywhere someone would expect to see one. Here's a screenshot of NeoVIM in the demo app using Fira Code (shown in Firefox, also works in Chrome but not Edge):
Branch is here if people want to take a look: https://github.com/princjef/xterm.js/tree/ligature-support
A couple of notes on this:
- I haven't done any benchmarking, but I'm betting this isn't going to be good for performance. By grouping all same-styled characters together, the character atlas basically gets thrown out the window, even for places where there are no ligatures (since we don't know where they will/won't be). When I render multiple characters together, I just set the character code to
Infinity
to ensure I'll avoid any caching. - There are probably some edge cases around character width and overlap that I'm not correctly dealing with. This is mostly a proof-of-concept at this point.
What about rendering the text using the character atlas, then rerendering it in the backround as a block of text while idle? If the two result in the same image, you can throw out the combined text and switch back to the atlas. By splitting up strings of text in the background, it might be possible to learn which strings of text are ligatured.
The tricky part with that is that the rendering of a ligature is dependent not only on the characters replaced by the ligature but also their context. For instance, '===1' should render the three equal signs as a ligature but '====' should render the same three equal signs as separate characters. There is no limit on how large this context can be, so it would be hard and likely error-prone to determine the rules of when a ligature is rendered just from its output.
A more reliable (but less portable) approach is to have the hints about ligature ranges provided by a separate function which knows the font metadata. Then everything but the ranges provided by the external function can be rendered using the atlas, while the groups given could use an approach like the one above. Determining the locations of substitutions given a line of text should be pretty fast but has some of the problems I detailed earlier (mainly speed/reliability of finding the right font on initialization and complexity of processing OpenType contextual alternates).
Would it be reasonable to have a setting to enable ligatures, which writes an entire line at a time to the terminal? It seems like this would guarantee correct rendering, and the performance hit would be an opt-in for folks who care more about ligatures than speed.
On Sun, Apr 22, 2018, 16:21 Jeff Principe [email protected] wrote:
The tricky part with that is that the rendering of a ligature is dependent not only on the characters replaced by the ligature but also their context. For instance, '===1' should render the three equal signs as a ligature but '====' should render the same three equal signs as separate characters. There is no limit on how large this context can be, so it would be hard and likely error-prone to determine the rules of when a ligature is rendered just from its output.
A more reliable (but less portable) approach is to have the hints about ligature ranges provided by a separate function which knows the font metadata. Then everything but the ranges provided by the external function can be rendered using the atlas, while the groups given could use an approach like the one above. Determining the locations of substitutions given a line of text should be pretty fast but has some of the problems I detailed earlier (mainly speed/reliability of finding the right font on initialization and complexity of processing OpenType contextual alternates).
β You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/xtermjs/xterm.js/issues/958#issuecomment-383420281, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKyryMlayIQijY32GpWBmpvCUi13Wfbks5trRBigaJpZM4PRej6 .
@princjef Wow, your edits are so easy to follow and it does seem like xterm can support ligatures with the renderer. The trick is just to figure out how to map the rules. This gives me hope.
@princjef great investigation!
There is no way that I can find to fetch the font data using browser APIs, so this won't work as a direct feature of xterm.js, but more likely as a separate package/extension with a hook exposed by xterm.js
@princjef I see eventual ligature support living as:
-
some-magical-package
: A native node module pulls in native font information, mapping web font strings to the files if possible. This could be done in anAsyncWorker
to reduce the performance hit. It's no big deal if this is done lazily and on the first launch it takes several seconds to scan and do a refresh. -
xtermjs/xterm-ligature-support
: An addon that depends on node and usessome-magical-package
to get and cache font info. This addon could do a scan whenever unrecognized fonts are detected in the fonts directory and evict fonts when they are removed. I expect something like this as the rough API:/** Returns a list of characters to be drawn together */ export function getLigatures(fontFamily: string): string[] { ... }
There are certain ligatures that don't make sense in a terminal, even if the font supports it (e.g. li)
@mofux I'm not sure we need to worry about this? If the user requests ligatures shouldn't we render them all?
Adding support for drawing and clearing a character that spans multiple cells is very hard to implement with our current single-character based rendering approach (CharAtlas).
@mofux We should be able to pretty easily add a function to draw a set of adjacent characters together.
@princjef: I haven't done any benchmarking, but I'm betting this isn't going to be good for performance.
@wavebeem: Would it be reasonable to have a setting to enable ligatures, which writes an entire line at a time to the terminal?
I expect it to negate almost completely the performance improvements that come with using the canvas. Also I want to keep the number of options to a minimum, we should try get to a place where ligatures just work when an addon is included to add support.
Also with regards to performance I will likely be working on adding a fallback DOM rendering option in the coming months as there are just too many things that can go wrong with GPU support in Chromium. See https://github.com/xtermjs/xterm.js/issues/1360. Ligatures will work out of the box in this mode.
What about rendering the text using the character atlas, then rerendering it in the backround as a block of text while idle? If the two result in the same image, you can throw out the combined text and switch back to the atlas.
@j-f1 this is more difficult than it seems. Even if the characters are rendered the same the second one will be different since the spacing in the char atlas is different (characters are always drawn on a round number). We would need to do a whole lot more rendering for this to work and a lot of diffing pixels which is expensive.
@Tyriar I think the general design you've described makes sense. We may be able to get away with something that doesn't depend on native code for finding the fonts in some-magical-package
, but it will definitely be dependent on the platform/filesystem. I've started playing around with parsing the contextual alternates substitutions but it's hard to say how much more it will take to get it right.
I also think we'll end up needing a slightly different interface to some-magical-package
:
export function getSubstitutionRanges(fontFamily: string, text: string): Promise<[number, number][]>;
The rules for determining the ligatures themselves is both complicated and baked pretty deep into the font itself. Rather than pass the font data itself and put the burden of interpreting it on xterm.js, I would leave that to the other lib and have it tell xterm.js which characters should be rendered together. The lookahead/lookbehind aspects of the context also complicate the parsing. For example, '===' maps to a ligature, but not if it is followed by another equals sign.
Another note about the substitution ranges concept: There isn't a clear delineation of the boundaries of a single ligature from what I can tell (at least when using contextual alternates). There are just sequences of substitutions applied to individual characters. I've found some tricks for figuring out the boundaries if you have consecutive ligatures, but it's probably not foolproof. I would probably err on the side of accidentally treating two ligatures as one rather than accidentally splitting them apart since they should still render correctly if rendered all together as a single group. The only real issue there is applying heterogeneous styles to it.
The only real issue there is applying heterogeneous styles to it.
Just donβt. Pass each string of continuous style to the function separately. You might be able to make an exception for underlined text if the underline is drawn separately.
We may be able to get away with something that doesn't depend on native code
π
The rules for determining the ligatures themselves is both complicated and baked pretty deep into the font itself. Rather than pass the font data itself and put the burden of interpreting it on xterm.js, I would leave that to the other lib and have it tell xterm.js which characters should be rendered together.
@princjef this sounds even better, the less xterm.js has to do in this area the better.
The only real issue there is applying heterogeneous styles to it.
I don't think anything else attempts to do this, we should only add ligatures for text that uses the same style.
This is looking doable on the font side. I have some code that successfully parses all of the Fira Code ligatures and provides the right ranges for the characters to combine. If people have one or two other fonts that they're looking for support on I can try to check those as well. So far I've only implemented the substitution types that I needed for Fira Code, so some variety would be welcome to exercise the other substitution types.
Still need to figure out the font lookup part. Going to focus on that next. There are some packages out there but they all seem to be either buggy or poorly maintained
@princjef If you want to check other fonts, I'm using Iosevka.
Alright I've created a package called font-ligatures
(a.k.a some-magical-package
) and some associated packages so that we can efficiently find the right font and then figure out where the ligatures are for a given text input.
I spent some time optimizing the process of finding the font. On a Surface Pro 4 with ~150 ttf/otf fonts I can fetch the font metadata for all of them in 300-400ms. It's mostly I/O bound and can be kicked to the background for the first few render cycles while it loads, but should be fast enough to be loaded by the time a pty has started up and spit out some text. Once it's loaded we can trigger a render to update whatever text might already be present. This can be repeated whenever the font changes or we can cache the full list the first time (I fetch the full list at the beginning anyway).
As for the ligature mapping itself, the library takes in a string and returns metadata about the font ligatures, including the groups of characters that should be rendered together. The CI includes tests for every ligature in Fira Code, Iosevka and Monoid, so I'm reasonably confident that it does the processing correctly for the substitution types it performs (though I'm sure there are some fonts out there that use other types which I haven't implemented).
However, I have spent no time optimizing/tuning the ligature parsing. I did some quick tests and it looks like parsing ligatures takes 2-20ms for a moderate length string (read: 1 line). There's still a lot of room to optimize so I'm not too worried at the moment. I mostly wanted to get this out to demonstrate the interface and let people kick the tires if they want.
Looks pretty cool @princjef! What do you think about adding tests to Fira Code for 0xc0ffee
, 0x1234
, and 17x32
? (The x
turns into a times sign on those)