afdko icon indicating copy to clipboard operation
afdko copied to clipboard

Reordering of ligature substitution rules is considered harmful

Open khaledhosny opened this issue 1 year ago • 10 comments

The Feature File Specification §5.d, states that:

A contiguous set of ligature rules does not need to be ordered in any particular way by the font editor; the implementation software must do the appropriate sorting. So:

sub f f     by f_f;
sub f i     by f_i;
sub f f i   by f_f_i;
sub o f f i by o_f_f_i;

will produce an identical representation in the font as:

sub o f f i by o_f_f_i;
sub f f i   by f_f_i;
sub f f     by f_f;
sub f i     by f_i;

There are several issues with this:

  1. It is very surprising to users, since the code has one order and the binary silently gets a different order, and the order matters as it controls which substitution is applied first,
  2. There is no way to prevent this automatic re-ordering, other than splitting each substitution to its own lookup which is wasteful and unnecessary,
  3. The sorting algorithm is undocumented, so there is no clear way to verify that implementations are implementing it compatibly.

I think this sorting should be deprecated and dropped, or if back-compatibility is a concern, have a way to disable it.

khaledhosny avatar Nov 16 '23 13:11 khaledhosny

I remember a time where the mantra “longer ligatures first” was important. I only found out about the re-ordering when trying to demonstrate this problem in one of my workshops.

I can see how this behavior might be considered a theoretical problem, but I think the benefits outweigh this concern. It seems natural for users to write shorter substitutions first.

That said, do you have a practical example where this re-sorting would cause actual harm?

FWIW, the sorting algorithm seems to be here: https://github.com/adobe-type-tools/afdko/blob/develop/c/makeotf/lib/hotconv/GSUB.c#L1730-L1768

frankrolf avatar Nov 16 '23 14:11 frankrolf

See https://forum.glyphsapp.com/t/prioritizing-certain-ligatures/19433/14 for an example.

khaledhosny avatar Nov 16 '23 14:11 khaledhosny

I don't see us just removing this part of the spec. Documenting the ordering requirement could be valuable, although there are a lot of things like this in the older parts of the spec and that horse may have left the barn. (We can document what AFDKO does, but that doesn't mean other implementations will update their algorithms if those differ.

We could add a flag to disable the sorting, but that would operate on a font-wide basis.

Seems like it might be better to add some sort of "explicit" command, similar to "subtable", that blocks any reordering within a lookup at the point where it is used.

skef avatar Nov 16 '23 16:11 skef

FWIW, the sorting algorithm seems to be here: https://github.com/adobe-type-tools/afdko/blob/develop/c/makeotf/lib/hotconv/GSUB.c#L1730-L1768

This sorts by length and GID, which is double bad. Sorting by legnth is understandable, though misguided, but sorting by GID makes no sense.

  1. The sorting algorithm is undocumented, so there is no clear way to verify that implementations are implementing it compatibly.

Case in point, FontTools only sorts by length https://github.com/fonttools/fonttools/blob/fa59ada1b557bc304c592a2ca91c6b99ff6d241d/Lib/fontTools/otlLib/builder.py#L1570

khaledhosny avatar Nov 16 '23 21:11 khaledhosny

Is the sort by glyphId simply to ensure consistent results between different sort algos?

Lorp avatar Nov 16 '23 23:11 Lorp

I don’t think there is any point in sorting by GID, as it changed the meaning of the code and is far more worse than sorting by length since that one is at least potentially desirable.

khaledhosny avatar Nov 17 '23 14:11 khaledhosny

Right, I was assuming the sort by GID was a secondary sort after the sort by length. Still, that could be confusing if you have some equal-length subs that you need to happen in sequence.

Lorp avatar Nov 17 '23 14:11 Lorp

FontTools only sorts by length

Well, actually it sorts by length first and secondarily sorts alphabetically by the ligature component glyph names. fra-rs I believe sorts by length and then GID, similar to makeotf if I understand correctly. I can see situations where the sorting is undesirable altogether. Ideally one should be able to opt out. For the default behavior I suppose we should stick to one officially documented ordering.

anthrotype avatar Jan 22 '24 21:01 anthrotype

So I've been revisiting this question along with @anthrotype, because there was a slight difference in the sorting behaviour of fea-rs (rust) and feaLib (python, fonttools) for these ligature rules, and for purposes of testing we try to have these two tools generate the same output wherever it is (ahem) feasible.

Currently, fea-rs matches afdko, but feaLib uses glyph names, not GIDs, to determine the ordering within a given LigatureSet table. We are now looking at standardizing on a single sorting approach, that accounts only for length, and is stable (in the order declared in the input) for ligatures within a ligature set. That is, given the following FEA,

sub f i by f_i;
sub f f f by f_f_f;
sub f f by f_f;
sub f f i by f_f_i;

we will end up with the final ordering,

f_f_f
f_f_i
f_i
f_f

In thinking about this, I have been trying to understand @khaledhosny's concerns about the sorting behaviour, specifically by trying to come up with some example of input text + ligature rules where the (unexpected) sorting behaviour could interfere with the designers intentions, and I'm struggling to come up with any.

My current understanding:

  • ligatures (within a lookup) are always grouped into ligature sets, grouped by their first glyph.
  • within a ligature set, the only possible 'interference' is if one ligature is a prefix of another ligature (e.g. f f is a prefix of f f i) in which case, if it occurs earlier in the set, the longer ligature will be unreachable.
  • If ligatures are of equal length then one cannot be a prefix of the other, by definition.
  • if ligatures do not start with the same letter then they will end up in different ligature sets anyway, and declaration order is irrelevant.
  • I cannot come up with a good argument for not ordering longer ligatures ahead of shorter ones. If we were not going to do this then I think we should just drop them completely, since they are going to be unreachable and are just dead bytes.
  • the example in the spec (and quoted in the original issue here) is slightly misleading, since o f f i is going to end up in a different ligature set than f f i, and will always be applied before f f i if it occurs, since the logical cursor will match the o before seeing the f.

Am I missing anything? Does anyone have an example of an input string and a set of ligature rules where the sorting behaviour would confound the designer's intentions?

I think it would be nice, if the spec is going to suggest sorting, that it define how that sorting should occur, and I think that a sorting that considers only length and otherwise respects declaration order is the simplest; but i don't think this is hugely important, since as far as I can tell it should have no impact on the shaping behaviour.

cmyr avatar Jan 23 '24 18:01 cmyr

Thanks Colin for clarifying the non-issue. We should not be talking about ordering of ligatures in general (as they appear in the feature.fea) but the order within a given ligature set keyed by first glyph, with each ligature set always necessarily sorted by the glyphID as per OpenType spec (no matter what FEA or font developer say). I agree that not ordering longer ligatures ahead of shorter ones may lead to some becoming unreachable -- why even bother having a f_f_i ligature if f_f would always match first?! So it makes sense to keep sorting ligature within a set by the length of ligature components. I also now see that even for different ligatures of equal length (within a set), it doesn't really matter which order they appear, either they will match the input string or they will not. So for these the only reason for specifying some order is consistency across implementations. We can sort by GID (like makotf and fea-rs do), by glyph name (like fonttools does), or not sort these (equal length ligatures with same first glyph) but keep in the same order as written in the FEA. I think overall the latter is the least effort for anybody so +1 to this.

anthrotype avatar Jan 24 '24 10:01 anthrotype