anchor-markdown-header
anchor-markdown-header copied to clipboard
Emoji in header breaks generated link
Continued from https://github.com/thlorenz/doctoc/issues/123
In short: If you place an emoji into a header, the generated anchor tag does not work.
Current generated output:
- [Modules 📦](#modules-%F0%9F%93%A6)
The actual link generated by GitHub just leaves out the emoji, but has a dash in there for the space.
So I actually tried it out with a heading like this:
# Modules 📦
And the Markdown used to make it work:
- [Modules 📦](#modules-)
You can see it here: https://github.com/adrianmcli/next-boilerplate/blob/master/README.md
I took out the emoji from the TOC though, just because I didn't think it looked good.
OK thanks for that info, so what we'd need to do is replace any emoj with a dash as well. Could you research how to detect any emoj in a regex? I've got no clue ;)
You can test things by forking and modifying this repo. You'll get the quickes feedback if you just add a failing test here with an emoj. Then you just try to modify the used regex until the test passes.
Finally you can PR with your changes. I'd greatly appreciate it. Thanks.
I don't think it becomes a space. It kind of just disappears. The dash is there because I had a space in between the emoji and the word.
For example, this:
# Modu📦les
Would become:
- [Modu📦les](#modules)
So we need to:
- Include the emoji as if it was an actual character/word.
- Convert spaces to dashes (regular conversion).
- Strip out the emoji.
Sounds good .. basically just add tests that assume it's doing all that and then make'em pass.
Quite easy really :P main challenge how do you regex match emojis
Apparently, detecting emojis with regex is actually really hard. Maybe this package can help: https://github.com/mathiasbynens/emoji-regex
@thlorenz PR submitted, yay! #37
There seem to be some emojis that are not completely captured by this solution

Didn't get around to analyze what makes this one special. Possibly related to the issue in the library you're using https://github.com/mathiasbynens/emoji-regex/issues/28
⚒
Hope we can catch those as well .. don't have bandwidth to attack this myself, but maybe you can help (PRs appreciated). Also @mathiasbynens should be considered the emoji regexpert (new word) so whatever he says in his issue would very likely be correct :)
Didn't get around to analyze what makes this one special. Possibly related to the issue in the library you're using mathiasbynens/emoji-regex#28
Yeah, that’s likely the issue. emoji-regex follows the Unicode standard, detecting only official emoji sequences. Apple’s macOS emoji picker randomly inserts U+FE0F after certain emoji despite that resulting in a non-standard sequence.
Why would you want to strip emojis, though? They’re perfectly valid in IDs and #foo-style in-page anchors. IMHO, a better fix would leave emojis intact and make sure the links are working instead.
@mathiasbynens unfortunately, that's just how the header links are generated by GitHub.
## my title🕵️here
Generates this anchor:
#my-titlehere
Should we instead try an opt-in method? That might be a big change though.
tbh I can live with the current behaviour. Just wanted to add the comment here for future generations to see. Depending on whether the issue gets fixed in the underlying library I'd recommend a Known Issues in the readme though :) I'll try to stick around and PR the docs if needed.
This never got resolved! Still breaks for me. Can we just drop the emoji or give me an option to?
I use emojis in headers here: https://github.com/Miserlou/dnd-tldr
@Miserlou I think your links will work if you just remove the emojis from the links. I tried modifying the URL fragment on one of your broken links and it worked.
If I remove the emojis from the links then it breaks in other apps like VSCode's markdown preview. I would like it to work on Github and other apps 🤔
The biggest problem is that the behavior is inconsistent. For example
# Alarm clock ⏰ has #alarm-clock- as link
# Apple Watch ⌚️ has #apple-watch-%EF%B8%8F as link.
This gets quite annoying when automatically generating table of contents, such as I do here: https://github.com/basnijholt/home-assistant-config/blob/35f3ae3942c5d343efe133fccd85415d4bdf6501/README.md#automations---table-of-content
Is there any news on this issue? This works in VSCode for me:
[React](#⚛️-React)
but doesn't on github. It tries to access this URL:
#⚛%EF%B8%8F-React
I'm getting these %EF%B8%8F unicodes that prints as empty chars. In the inspector I see this:

But hovering the link it shows this:

The same that goes to the URL:

That is hanpping with multiple emojis: ⚠️✍️⏲️🛠️⚙️☝️⚡️ and others...
But as others said, I prefer that the emojis was just added as is, not removing like that because it breaks on other softwares.
#45 should address this by taking a new version of emoji-regex