comrak icon indicating copy to clipboard operation
comrak copied to clipboard

Option to treat `,` as a word delimiter for info-strings

Open Nemo157 opened this issue 3 years ago • 6 comments

Currently when parsing the info-string to determine the codeblock language the word up to the first space character is used:

https://github.com/kivikakk/comrak/blob/03238b81d0917acbea73a2d255603f8051e4dc04/src/html.rs#L490-L492

This causes issues with markdown such as in the regex readme using the info-string rust,ignore, it is passed into the syntax highlighter as the string rust,ignore and applied as an attribute class="language-rust,ignore" on the element. Both rustdoc and github support the , character being a delimiter between the language and additional attributes (rustdoc actually supports more, but that's for back-compat, afaik only [ ,] is intended to be used).

Nemo157 avatar Oct 16 '22 16:10 Nemo157

Then we should definitely support it, imo. PRs happily accepted.

kivikakk avatar Oct 24 '22 10:10 kivikakk

Just noticed this too. I'd be happy to take on implementing this!

CosmicHorrorDev avatar Jun 26 '23 19:06 CosmicHorrorDev

Implementation-wise where exactly should this change go? Should it change the parsed AST / HTML to split the language for the info string on a comma (this would seem to diverge from cmark-gfm which includes the first word up to the space), or should it just change the syntect plugin to split off of a comma when trying to find a matching syntax?

CosmicHorrorDev avatar Jun 27 '23 23:06 CosmicHorrorDev

Out of pure curiosity, how does GitHub use this comma delimiting feature? I tried to find an example on their help docs but couldn’t find an explanation.

gjtorikian avatar Jun 28 '23 01:06 gjtorikian

or should it just change the syntect plugin to split off of a comma when trying to find a matching syntax?

We did this for docs.rs' highlighting plugin (using syntect as well, but we need to customize it more), it's ok for the actual highlighting, but fixing up the attributes in write_code_tag is a pain: https://github.com/rust-lang/docs.rs/blob/eb803472b52aac49fb0c8a736d7d74f87533e12d/src/web/markdown.rs#L37-L50

Out of pure curiosity, how does GitHub use this comma delimiting feature?

AFAIK it doesn't, it just strips everything after the comma at some point between markdown -> html.

Nemo157 avatar Jun 28 '23 08:06 Nemo157

Ah, got it—the stripping is simply to remove the info, not do anything special with it. 👍

gjtorikian avatar Jun 28 '23 14:06 gjtorikian