opentype.js Support multi-character emoji

This issue with monochrome Noto Emoji is distinct from the color emoji issue (#193).

#338 added support for non-Basic-Multilingual-Plane (BMP) characters, but uses Array.from, which doesn't account for combined emoji.

It seems that Opentype.js has the glyph information needed, but the initial text-to-glyph translation is the issue:

https://opentype.js.org/glyph-inspector.html

Expected Behavior

Calling notoEmojiFont.draw(context, "👨‍👩‍👧‍👦") should render

Current Behavior

Calling notoEmojiFont.draw(context, "👨‍👩‍👧‍👦") renders

Possible Solution

If "ccmp" is not supported yet and would cover this, this issue can be closed as a duplicate of https://github.com/opentypejs/opentype.js/issues/443.
Intl.Segmenter is a native solution, but isn't supported by Firefox yet.

const splitSegmentArray = (string) => Array.from(new Intl.Segmenter().segment(string)).map(x => x.segment);
console.log(splitSegmentArray("😅👨‍👩‍👧‍👦💖👩‍💻💔👩‍🌾🧡👨🏽‍🌾💜🖖🏾🌈"))

graphemer is a library-based solution. (It is a fairly big library.)
twemoji-parser is focused on parsing emoji sequences, so it's smaller than graphemer.

Steps to Reproduce (for bugs)

Live demo: https://gm69qn.csb.app

Call notoEmojiFont.stringToGlyphs("👨‍👩‍👧‍👦") and get glyphs for "👨👩👧👦" interspersed with the combiner ("uni200D") instead of the one glyph for the combined family.

Same for other combined emoji, like 👩‍💻, 👩‍🌾, 👨🏽‍🌾, 🖖🏾

Context

We're adding support for emoji to Cuttle CAD, which can render various fonts as vectors for laser cutting, etc.

Your Environment

Version used: 1.3.4
Font used: Noto Emoji (ttf)
Browser Name and version: Various tested
Operating System and version (desktop or mobile): Mac OS desktop
Link to your project: https://gm69qn.csb.app

May 02 '22 11:05 forresto

It seems like font.tables.gsub has the ligatureSets info needed to combine these. Is that something that I can enable with an option?

notoEmojiFont.substitution.getFeature("ccmp") // Array(3640)

The feature tag is "ccmp" ... I'm not seeing that called with defaults via getFeature or getMultiple, though there are some tests. 🤔

If "ccmp" is not supported yet, this can be closed as a duplicate of #443.

May 03 '22 10:05 forresto

Looking at #443 I thought this was worth a try:

notoEmojiFont.substitution.add(
  "ccmp", 
  notoEmojiFont.substitution.getFeature('ccmp')
);

but got:

Error: Ligature: unable to modify coverage table format 2

May 09 '22 09:05 forresto

In addition to the ccmp substitutions, https://en.wikipedia.org/wiki/Variation_Selectors_(Unicode_block) need to be taken into account. For example, "☠" vs "☠️".

May 11 '22 14:05 forresto

Im also looking for a workaround for this. It would be nice to support it or have workaround?

May 16 '22 05:05 jamesjoung

Here's my workaround.

// Opentype.js doesn't actually support these substitutions, so we'll have to
// search them manually
const substitutions = font.substitution.getFeature("ccmp");

function emojiToGlyph (emojiString) {

const glyphs = font
  .stringToGlyphs(emojiString)
  // Discarding these makes the substitution search work for emoji sequences
  // with variation selectors
  // https://en.wikipedia.org/wiki/Variation_Selectors_(Unicode_block)
  .filter((glyph) => glyph.index <= 1850);
let glyph;
if (glyphs.length === 1) {
  glyph = glyphs[0];
} else if (glyphs.length > 1) {
  const indexes = glyphs.map((glyph) => glyph.index);
  const sub = substitutions.find((substitution) => equals(substitution.sub, indexes));
  if (sub) {
    glyph = font.glyphs.get(sub.by);
  }
}
if (glyph) {
  return glyph;
} else {
  throw new Error(`${emojiString} - couldn't find a glyph :(`);
}

}

emojiToGlyph("👨‍👩‍👧‍👦");

/** Custom equals function that can also check lists. */
function equals(a, b) {
  if (a === b) {
    return true;
  } else if (Array.isArray(a) && Array.isArray(b)) {
    if (a.length !== b.length) {
      return false;
    }
    for (let i = 0; i < a.length; i += 1) {
      if (!equals(a[i], b[i])) {
        return false;
      }
    }
    return true;
  } else {
    return false;
  }
}

Caveats:

This only works for one emoji. To replace the glyphs in an arbitrary string, we would also need tokenizer logic.

Only tested with Noto Emoji.

May 16 '22 18:05 forresto

here's the different options: https://medium.com/making-faces-and-other-emoji/emoji-fonts-technically-40f3fdc0869e I'd recommend at least supporting COLR/CPAL as it's probably the most widely supported one and one of the most implemented in fonts. It would also probably be a good idea to implement CBDT/CBLC support as well.

Nov 20 '22 01:11 ILOVEPIE

ccmp looks like an enforcement feature. It's not display in feature list, but always runs before decode a text. https://learn.microsoft.com/en-us/typography/script-development/standard Maybe we can add a preprocessing process in Font.stringToGlyphs() ?

Mar 18 '24 09:03 TonyJR

opentype.js opentype.js copied to clipboard

Support multi-character emoji

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

opentype.js
opentype.js copied to clipboard