opentype.js missing emoji substitutions

trafficstars

Expected Behavior

There are a handful of emoji substitutions that are not found, even after #688 landed.

❤️‍🩹 should render as one glyph. (1433)

Current Behavior

❤️‍🩹 is rendering as 3 glyphs, ([ 169, 18, 1345 ])

Possible Solution

I can make a PR with failing test cases, if that's helpful.

Steps to Reproduce (for bugs)

#️⃣ found sub [ 4, 22 ] 1520
*️⃣ found sub [ 5, 22 ] 1521
0️⃣ found sub [ 6, 22 ] 1531
1️⃣ found sub [ 7, 22 ] 1522
⛹️‍♀️ found sub [ 140, 18, 81 ] 140
⛹️‍♂️ found sub [ 140, 18, 82 ] 140
❤️‍🔥 found sub [ 169, 18, 794 ] 1432
❤️‍🩹 found sub [ 169, 18, 1345 ] 1433

I'm manually looking for substitutions to find these, like this...

  const substitutions = font.substitution.getFeature("ccmp");

  let opentypeOptions = {
    kerning: true,
    language: "dflt",
    features: [{ script: "DFLT", tags: ["ccmp", "liga"] }],
  };

  for (const emoji of emojiData) {
    const { unicode } = emoji;
    const glyphs = font.stringToGlyphs(unicode, opentypeOptions);
    let glyph;
    if (glyphs.length === 1) {
      glyph = glyphs[0];
    } else {
      const indexes = glyphs.map((glyph) => glyph.index);
      const sub = substitutions.find((substitution) => equals(substitution.sub, indexes));

      if (sub) {
        glyph = font.glyphs.get(sub.by);
        console.log(unicode, "found sub", indexes, sub.by);
      } else {
        console.log(unicode, "no ccmp sub", indexes);
      }
    }
  }

/** Custom equals function that can also check lists. */
function equals(a, b) {
  if (a === b) {
    return true;
  } else if (Array.isArray(a) && Array.isArray(b)) {
    if (a.length !== b.length) {
      return false;
    }
    for (let i = 0; i < a.length; i += 1) {
      if (!equals(a[i], b[i])) {
        return false;
      }
    }
    return true;
  } else {
    return false;
  }
}

Context

Using noto-emoji in our CAD app, https://cuttle.xyz

Your Environment

Version used: be0d4417a04d92d43178e075273048e926164abf
Font used: noto-emoji v47
Browser Name and version: Node
Operating System and version (desktop or mobile):
Link to your project:

Apr 15 '24 07:04 forresto

@TonyJR would you be available to have a look at this, as you implemented the ccmp feature?

Apr 15 '24 08:04 Connum

@TonyJR would you be available to have a look at this, as you implemented the ccmp feature?

Yes, I'm trying this. I found the rule for "#️⃣ found sub [ 4, 22 ] 1520"

sub numbersign uni20E3 by keycap_hash;

It's should be GSUB4.1. I will find the reason.

Apr 15 '24 08:04 TonyJR

Here are the ones that should result in one glyph, but return multiple.

[
{"string":"#️⃣","indexes":[4,23],"expected":1548},
{"string":"*️⃣","indexes":[5,23],"expected":1549},
{"string":"0️⃣","indexes":[6,23],"expected":1559},
{"string":"1️⃣","indexes":[7,23],"expected":1550},
{"string":"2️⃣","indexes":[8,23],"expected":1551},
{"string":"3️⃣","indexes":[9,23],"expected":1552},
{"string":"4️⃣","indexes":[10,23],"expected":1553},
{"string":"5️⃣","indexes":[11,23],"expected":1554},
{"string":"6️⃣","indexes":[12,23],"expected":1555},
{"string":"7️⃣","indexes":[13,23],"expected":1556},
{"string":"8️⃣","indexes":[14,23],"expected":1557},
{"string":"9️⃣","indexes":[15,23],"expected":1558},
{"string":"🏋️‍♀️","indexes":[447,18,82],"expected":447},
{"string":"🏋️‍♂️","indexes":[447,18,83],"expected":447},
{"string":"🏌️‍♀️","indexes":[448,18,82],"expected":448},
{"string":"🏌️‍♂️","indexes":[448,18,83],"expected":448},
{"string":"🏳️‍🌈","indexes":[485,18,256],"expected":1871},
{"string":"🏳️‍⚧️","indexes":[485,18,116],"expected":1872},
{"string":"👁️‍🗨️","indexes":[566,18,886],"expected":1432},
{"string":"👨‍❤️‍👨","indexes":[605,18,170,18,605],"expected":646},
{"string":"👨‍❤️‍💋‍👨","indexes":[605,18,170,18,640,18,605],"expected":644},
{"string":"👩‍❤️‍👨","indexes":[606,18,170,18,605],"expected":646},
{"string":"👩‍❤️‍👩","indexes":[606,18,170,18,606],"expected":646},
{"string":"👩‍❤️‍💋‍👨","indexes":[606,18,170,18,640,18,605],"expected":644},
{"string":"👩‍❤️‍💋‍👩","indexes":[606,18,170,18,640,18,606],"expected":644},
{"string":"🕵️‍♀️","indexes":[855,18,82],"expected":855},
{"string":"🕵️‍♂️","indexes":[855,18,83],"expected":855},
{"string":"⛹️‍♀️","indexes":[141,18,82],"expected":141},
{"string":"⛹️‍♂️","indexes":[141,18,83],"expected":141},
{"string":"❤️‍🔥","indexes":[170,18,795],"expected":1433},
{"string":"❤️‍🩹","indexes":[170,18,1346],"expected":1434},
]

Apr 15 '24 09:04 forresto

I found the reason! You have inputed a “fully-qualified” emoji and the font not supported.

Apr 15 '24 10:04 TonyJR

WTF！Figma draw it right. I'm going to find the reason out.

Apr 15 '24 11:04 TonyJR

@TonyJR any progress on this?

Apr 24 '24 18:04 Connum

@TonyJR any progress on this?

Sorry, I've been a bit busy lately. \uFE00-\uFE0F are variation selectors, which should deal in cmap. I have tested halfbuzz and it skips these characters. I have two solutions to solve the bug.

Process cmap before processing gsub. After this, remove them away.
Skip them when processing gsub.

I prefer the first option. @Connum, are you familiar with CMAP.

Apr 25 '24 10:04 TonyJR

I implemented a special handling of variation selectors some time ago, maybe that's interfering? And the order of processing should be stated in the docs. As far as I remember, cmap should be handled before any layout is applied.

Apr 25 '24 11:04 Connum

Yes, you are right. I'm trying to find the order. But I prefer to directly refer to the Halfbuzz source code. And I found that half actually merges and processes the functions in gsub/gpos. Perhaps we should also refer to it, but this may be a big project...

Apr 25 '24 12:04 TonyJR

opentype.js opentype.js copied to clipboard

missing emoji substitutions

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

opentype.js
opentype.js copied to clipboard