cosmic-text icon indicating copy to clipboard operation
cosmic-text copied to clipboard

Some ligatures are not being applied

Open peppidesu opened this issue 11 months ago • 4 comments

Not all ligatures are functional for certain fonts, even when explicitly enabling them.

How to reproduce

Font: Maple Mono NF Features: calt

Edit an example from the project to render the following string with the given font and features:

/* /** ++ +++ <-> <|> ||> |> --> -> != !== =!= :?> ||= |= =/= <-< |=> -~ </> /> <!---->

All of the ligatures present in the string are (partially) not being applied, or incorrectly. Image

Whereas all of the other ones work fine (e.g. =>, ==).

For reference, this is what the same string should look like:

Image

EDIT: I have confirmed the following fonts also experience this issue:

  • Iosevka
  • Fira Code

peppidesu avatar Apr 07 '25 12:04 peppidesu

I might have found the culprit, anyone correct me if I'm wrong:

If you run the modified example as described in the issue description with RUST_LOG=trace, the following output appears:

[2025-04-07T17:16:24Z TRACE cosmic_text::shape]       Word: '/'
[2025-04-07T17:16:24Z TRACE cosmic_text::shape]       Run []: '/'
[2025-04-07T17:16:24Z TRACE cosmic_text::shape]       Word: '* /'
[2025-04-07T17:16:24Z TRACE cosmic_text::shape]       Run []: '* /'
[2025-04-07T17:16:24Z TRACE cosmic_text::shape]       Word: '**'
[2025-04-07T17:16:24Z TRACE cosmic_text::shape]       Run []: '**'
...
[2025-04-07T17:16:24Z TRACE cosmic_text::shape]       Word: '<-'
[2025-04-07T17:16:24Z TRACE cosmic_text::shape]       Run []: '<-'
[2025-04-07T17:16:24Z TRACE cosmic_text::shape]       Word: '>'
[2025-04-07T17:16:24Z TRACE cosmic_text::shape]       Run []: '>'
[2025-04-07T17:16:24Z TRACE cosmic_text::shape]       Word BLANK: ' '
[2025-04-07T17:16:24Z TRACE cosmic_text::shape]       Run []: ' '
...

It seems that cosmic_text is breaking up character groups to facilitate line breaking, and shapes the words in isolation, removing the context required to apply certain ligatures (but not all, since for example <- happens to not be broken up).

peppidesu avatar Apr 07 '25 17:04 peppidesu

I've had this issue as well, using JetBrains Mono and these:

   !   !!   !=  !==    "    #   ##  ###
####    $    %    &   &&  &&&   &=    '
   (    )    *  ***   */    +   ++  +++
   ,   --  ---   ->    .   ..  ...    /
  /*   //  ///   /=   /\   \/    0    1
   2    3    4    5    6    7    8    9
   :   ::  :::   :=   :>    ;   ;;    <
  <-   <:   <<  <<<   <=  <==   <>   ==
 ===  ==>  ==>   =>    >   >=   >>  >>>
   ?   ??   ?=    @    A    B    C    D
   E    F    G    H    I    J    K    L
   M    N    O    P    Q    R    S    T
   U    V    W    X    Y    Z    [    \
   ]    ^   ^=    _   __    `    a    b
   c    d    e    f    g    h    i    j
   k    l    m    n    o    p    q    r
   s    t    u    v    w    x    y    z
   {    |   |=   ||  ||=    }    ~   ~=
  ~~

(zoomed in) cosmic-text in the foreground, VS code in the background (it is getting more of the ligatures, but it is not getting all the ligatures either):

Image

tigregalis avatar Apr 14 '25 16:04 tigregalis

So here are my findings about this issue:

  • We need to know what characters sequences have associated ligatures.
  • We need to disregard unicode breakpoints that would break such character sequences when dividing the line into segments.
  • (correctness): We should not discard these breakpoints entirely, because the unicode specification requires us to allow line wrapping that breaks ligatures.

peppidesu avatar Jul 22 '25 08:07 peppidesu

In response to @jackpot51's comment in #392:

This is a code snippet from the build() function for ShapeSpan: https://github.com/pop-os/cosmic-text/blob/eebdd01a8e0c4298d316c893e8ada683d396bec9/src/shape.rs#L805-L849

Here, the span is broken up into words based on the positions where line breaks can occur, according to the Unicode specification. This is done to both granularize caching and to implement the first part of word-wrapping.

However, these Unicode breakpoints cut through places where grapheme clusters would be, preventing harfrust from applying ligatures (as harfrust is fed each ShapeWord separately). This is the culprit.

I've looked into harfrust some more, and found two functions relevant to this problem that suggest a different approach should be taken entirely: GlyphInfo::unsafe_to_break() and GlyphInfo::unsafe_to_concat().

For starters, we cannot first shape an entire span and then wrap it, because line breaks can require some grapheme clusters to be reshaped, or split grapheme clusters in two. We also cannot know where clusters are before shaping. So unless we have information about the previous glyph run and input text, the best we can do is:

  1. shape the entire span as a whole
  2. figure out where line breaks need to be inserted
  3. reshape the grapheme clusters that have become invalid due to these line breaks

This fixes the problem at the cost of no longer having any caching.

To reintroduce caching, we need 3 things: the new input string, the previous input string, and the previous glyph run. We can try and create a diff of the two input strings, and based on that deduce which of the previous grapheme clusters can be safely reused. We reshape the parts of the input text where this isn't the case, as well as grapheme clusters that were previously reshaped because of a line break (as these might no longer be situated at a line break). Finally we perform step 2 and 3 like we did before.

peppidesu avatar Dec 09 '25 12:12 peppidesu

I created a PR at least for some of these? https://github.com/pop-os/cosmic-text/pull/452

adam-r-kowalski avatar Jan 13 '26 18:01 adam-r-kowalski