Emoji Rendering Discrepancy Between Inline and Block Elements
Version of Marp Tool
v1.0.0
Operating System
Linux
Environment
Running in a Docker container. (The latest version 2.0.4 seems to suffer from the same issue.)
How to reproduce
Create the following Markdown file named slide-deck.md:
---
---
<span>Inline: 🙂</span>
<div>Block: 🙂</div>
Run the following CLI command with Docker to generate HTML:
$ docker run --rm -v $PWD:/home/marp/app/ -e LANG=$LANG -e MARP_USER="$(id -u):$(id -g)" marpteam/marp-cli:v1.0.0 slide-deck.md
Run the following CLI command with Docker to generate PDF:
$ docker run --rm -v $PWD:/home/marp/app/ -e LANG=$LANG -e MARP_USER="$(id -u):$(id -g)" marpteam/marp-cli:v1.0.0 slide-deck.md --pdf
Expected behavior
Both emojis rendered the same way and visible in the resulting PDF document.
Actual behavior

The inline emoji is rendered as an image element: <img class="emoji" draggable="false" alt="🙂" src="https://twemoji.maxcdn.com/2/svg/1f642.svg" data-marp-twemoji="">, while the block element emoji is rendered literally: 🙂
This is a problem when targeting PDF as the output format:

Additional information
No response
https://markdown-it.github.io/#md3=%7B%22source%22%3A%22%3Cspan%3EInline%3A%20%26%23128578%3B%3C%2Fspan%3E%5Cn%5Cn%3Cdiv%3EBlock%3A%20%26%23128578%3B%3C%2Fdiv%3E%22%2C%22defaults%22%3A%7B%22html%22%3Afalse%2C%22xhtmlOut%22%3Afalse%2C%22breaks%22%3Afalse%2C%22langPrefix%22%3A%22language-%22%2C%22linkify%22%3Atrue%2C%22typographer%22%3Atrue%2C%22_highlight%22%3Atrue%2C%22_strict%22%3Atrue%2C%22_view%22%3A%22debug%22%7D%7D
markdown-it AST of the provided example will become as below:
[
{
"type": "paragraph_open",
"tag": "p",
"attrs": null,
"map": [
0,
1
],
"nesting": 1,
"level": 0,
"children": null,
"content": "",
"markup": "",
"info": "",
"meta": null,
"block": true,
"hidden": false
},
{
"type": "inline",
"tag": "",
"attrs": null,
"map": [
0,
1
],
"nesting": 0,
"level": 1,
"children": [
{
"type": "html_inline",
"tag": "",
"attrs": null,
"map": null,
"nesting": 0,
"level": 0,
"children": null,
"content": "<span>",
"markup": "",
"info": "",
"meta": null,
"block": false,
"hidden": false
},
{
"type": "text",
"tag": "",
"attrs": null,
"map": null,
"nesting": 0,
"level": 0,
"children": null,
"content": "Inline: 🙂",
"markup": "🙂",
"info": "entity",
"meta": null,
"block": false,
"hidden": false
},
{
"type": "html_inline",
"tag": "",
"attrs": null,
"map": null,
"nesting": 0,
"level": 0,
"children": null,
"content": "</span>",
"markup": "",
"info": "",
"meta": null,
"block": false,
"hidden": false
}
],
"content": "<span>Inline: 🙂</span>",
"markup": "",
"info": "",
"meta": null,
"block": true,
"hidden": false
},
{
"type": "paragraph_close",
"tag": "p",
"attrs": null,
"map": null,
"nesting": -1,
"level": 0,
"children": null,
"content": "",
"markup": "",
"info": "",
"meta": null,
"block": true,
"hidden": false
},
{
"type": "html_block",
"tag": "",
"attrs": null,
"map": [
2,
3
],
"nesting": 0,
"level": 0,
"children": null,
"content": "<div>Block: 🙂</div>",
"markup": "",
"info": "",
"meta": null,
"block": true,
"hidden": false
}
]
Marp Core will transform an emoji within the content of inline markdown-it token into marp_unicode_emoji token, and render marp_unicode_emoji token as a twemoji SVG image.
https://github.com/marp-team/marp-core/blob/5c5eda0fb7ea9a202a3b0345202272bb0d9a457f/src/emoji/emoji.ts#L76-L109
On the other hand, the block element and its children are parsed as a single html_block token. Marp Core does not transform emojis within html_block token because may break raw HTML elements in some cases.
For emoji transformation in html_block token correctly, should implement a robust HTML parser and entity resolver, that are working in both Node.js and the browser. Unfortunately, we have not yet implemented them due to a lot of concerns:
-
html_blocktoken may have only a part of the completed HTML block. So well-known HTML compliant parsers, such as browser's DOMParser, htmlparser2, and parse5 cannot use in our use case.<div class="😄"> # Markdown content 👍 </div>In above case,
html_blocktoken will be split into<div class="😄">and</div>. When tried to parse and tranform these fragments with a known parser, the opening element will be unnecessarily closed due to HTML compliant behavior of auto-closing tags, and parsing the closing element will fail as invalid HTML. -
If applied a simple string replacement, the raw HTML block may break in some edge cases.
- Raw JS:
<script>document.title = "🙂";</script>➡️<script>document.title = "<img class="emoji" draggable="false" alt="🙂" src="https://twemoji.maxcdn.com/2/svg/1f642.svg" data-marp-twemoji="">";</script>
- Raw JS: