Option to treat `,` as a word delimiter for info-strings
Currently when parsing the info-string to determine the codeblock language the word up to the first space character is used:
https://github.com/kivikakk/comrak/blob/03238b81d0917acbea73a2d255603f8051e4dc04/src/html.rs#L490-L492
This causes issues with markdown such as in the regex readme using the info-string rust,ignore, it is passed into the syntax highlighter as the string rust,ignore and applied as an attribute class="language-rust,ignore" on the element. Both rustdoc and github support the , character being a delimiter between the language and additional attributes (rustdoc actually supports more, but that's for back-compat, afaik only [ ,] is intended to be used).
Then we should definitely support it, imo. PRs happily accepted.
Just noticed this too. I'd be happy to take on implementing this!
Implementation-wise where exactly should this change go? Should it change the parsed AST / HTML to split the language for the info string on a comma (this would seem to diverge from cmark-gfm which includes the first word up to the space), or should it just change the syntect plugin to split off of a comma when trying to find a matching syntax?
Out of pure curiosity, how does GitHub use this comma delimiting feature? I tried to find an example on their help docs but couldn’t find an explanation.
or should it just change the
syntectplugin to split off of a comma when trying to find a matching syntax?
We did this for docs.rs' highlighting plugin (using syntect as well, but we need to customize it more), it's ok for the actual highlighting, but fixing up the attributes in write_code_tag is a pain: https://github.com/rust-lang/docs.rs/blob/eb803472b52aac49fb0c8a736d7d74f87533e12d/src/web/markdown.rs#L37-L50
Out of pure curiosity, how does GitHub use this comma delimiting feature?
AFAIK it doesn't, it just strips everything after the comma at some point between markdown -> html.
Ah, got it—the stripping is simply to remove the info, not do anything special with it. 👍