anchor-markdown-header icon indicating copy to clipboard operation
anchor-markdown-header copied to clipboard

Emoji in header breaks generated link

Open adrianmcli opened this issue 8 years ago • 17 comments

Continued from https://github.com/thlorenz/doctoc/issues/123

In short: If you place an emoji into a header, the generated anchor tag does not work.

Current generated output:

- [Modules 📦](#modules-%F0%9F%93%A6)

The actual link generated by GitHub just leaves out the emoji, but has a dash in there for the space.

So I actually tried it out with a heading like this:

# Modules 📦

And the Markdown used to make it work:

- [Modules 📦](#modules-)

You can see it here: https://github.com/adrianmcli/next-boilerplate/blob/master/README.md

I took out the emoji from the TOC though, just because I didn't think it looked good.

adrianmcli avatar Feb 16 '17 04:02 adrianmcli

OK thanks for that info, so what we'd need to do is replace any emoj with a dash as well. Could you research how to detect any emoj in a regex? I've got no clue ;)

You can test things by forking and modifying this repo. You'll get the quickes feedback if you just add a failing test here with an emoj. Then you just try to modify the used regex until the test passes.

Finally you can PR with your changes. I'd greatly appreciate it. Thanks.

thlorenz avatar Feb 16 '17 15:02 thlorenz

I don't think it becomes a space. It kind of just disappears. The dash is there because I had a space in between the emoji and the word.

For example, this:

# Modu📦les

Would become:

- [Modu📦les](#modules)

adrianmcli avatar Feb 16 '17 15:02 adrianmcli

So we need to:

  1. Include the emoji as if it was an actual character/word.
  2. Convert spaces to dashes (regular conversion).
  3. Strip out the emoji.

adrianmcli avatar Feb 16 '17 15:02 adrianmcli

Sounds good .. basically just add tests that assume it's doing all that and then make'em pass.

Quite easy really :P main challenge how do you regex match emojis

thlorenz avatar Feb 16 '17 15:02 thlorenz

Apparently, detecting emojis with regex is actually really hard. Maybe this package can help: https://github.com/mathiasbynens/emoji-regex

adrianmcli avatar Feb 16 '17 15:02 adrianmcli

@thlorenz PR submitted, yay! #37

adrianmcli avatar Feb 16 '17 16:02 adrianmcli

There seem to be some emojis that are not completely captured by this solution image

Didn't get around to analyze what makes this one special. Possibly related to the issue in the library you're using https://github.com/mathiasbynens/emoji-regex/issues/28

anoff avatar Aug 14 '17 15:08 anoff

Hope we can catch those as well .. don't have bandwidth to attack this myself, but maybe you can help (PRs appreciated). Also @mathiasbynens should be considered the emoji regexpert (new word) so whatever he says in his issue would very likely be correct :)

thlorenz avatar Aug 14 '17 22:08 thlorenz

Didn't get around to analyze what makes this one special. Possibly related to the issue in the library you're using mathiasbynens/emoji-regex#28

Yeah, that’s likely the issue. emoji-regex follows the Unicode standard, detecting only official emoji sequences. Apple’s macOS emoji picker randomly inserts U+FE0F after certain emoji despite that resulting in a non-standard sequence.

Why would you want to strip emojis, though? They’re perfectly valid in IDs and #foo-style in-page anchors. IMHO, a better fix would leave emojis intact and make sure the links are working instead.

mathiasbynens avatar Aug 15 '17 05:08 mathiasbynens

@mathiasbynens unfortunately, that's just how the header links are generated by GitHub.

## my title🕵️here

Generates this anchor:

#my-titlehere

Should we instead try an opt-in method? That might be a big change though.

adrianmcli avatar Aug 15 '17 06:08 adrianmcli

tbh I can live with the current behaviour. Just wanted to add the comment here for future generations to see. Depending on whether the issue gets fixed in the underlying library I'd recommend a Known Issues in the readme though :) I'll try to stick around and PR the docs if needed.

anoff avatar Aug 15 '17 07:08 anoff

This never got resolved! Still breaks for me. Can we just drop the emoji or give me an option to?

I use emojis in headers here: https://github.com/Miserlou/dnd-tldr

Miserlou avatar Apr 28 '20 04:04 Miserlou

@Miserlou I think your links will work if you just remove the emojis from the links. I tried modifying the URL fragment on one of your broken links and it worked.

ccheever avatar May 08 '20 02:05 ccheever

If I remove the emojis from the links then it breaks in other apps like VSCode's markdown preview. I would like it to work on Github and other apps 🤔

ctsstc avatar May 10 '20 08:05 ctsstc

The biggest problem is that the behavior is inconsistent. For example # Alarm clock ⏰ has #alarm-clock- as link # Apple Watch ⌚️ has #apple-watch-%EF%B8%8F as link.

This gets quite annoying when automatically generating table of contents, such as I do here: https://github.com/basnijholt/home-assistant-config/blob/35f3ae3942c5d343efe133fccd85415d4bdf6501/README.md#automations---table-of-content

basnijholt avatar Sep 11 '20 07:09 basnijholt

Is there any news on this issue? This works in VSCode for me: [React](#⚛️-React)

but doesn't on github. It tries to access this URL: #⚛%EF%B8%8F-React

jon424 avatar Mar 29 '21 19:03 jon424

I'm getting these %EF%B8%8F unicodes that prints as empty chars. In the inspector I see this:

image

But hovering the link it shows this:

image

The same that goes to the URL:

image

That is hanpping with multiple emojis: ⚠️✍️⏲️🛠️⚙️☝️⚡️ and others...

But as others said, I prefer that the emojis was just added as is, not removing like that because it breaks on other softwares.

Alynva avatar Oct 04 '21 15:10 Alynva

#45 should address this by taking a new version of emoji-regex

kevin-david avatar Aug 06 '22 22:08 kevin-david