rehype-pretty-code
rehype-pretty-code copied to clipboard
Support arbitrary delimiter in wordHighlighter
Hi!
Context
I needed to match a pattern containing a /
character, therefore /
being used as the hard-coded delimiter proved limiting.
Proposal
Would you consider a PR building upon this core change (I used patch-package to get things going quickly):
diff --git a/node_modules/rehype-pretty-code/dist/rehype-pretty-code.cjs b/node_modules/rehype-pretty-code/dist/rehype-pretty-code.cjs
index 54c1c36..26d8a36 100644
--- a/node_modules/rehype-pretty-code/dist/rehype-pretty-code.cjs
+++ b/node_modules/rehype-pretty-code/dist/rehype-pretty-code.cjs
@@ -12897,12 +12897,12 @@ function rehypePrettyCode(options = {}) {
const wordIdsMap = new Map();
const wordMatches = meta
- ? [...meta.matchAll(/\/(.*?)\/(\S*)/g)]
+ ? [...meta.matchAll(/(?<delimiter>["\/])(?<word>.*?)\k<delimiter>(?<wordIdAndOrRange>\S*)/g)]
: undefined;
if (Array.isArray(wordMatches)) {
wordMatches.forEach((name, index) => {
- const word = wordMatches[index][1];
- const wordIdAndOrRange = wordMatches[index][2];
+ const { word, wordIdAndOrRange } = name.groups;
words.push(word);
const [range, id] = wordIdAndOrRange.split('#');
Remarks
- Here’s a test sample on Regex101 (I used PCRE as I feel like it’s closer to V8’s Irregexp engine than EcmaScript’s, but it works regardless the engine).
- I leveraged named capture groups to increase readibility, which have been supported since Node 10. As rehype-pretty-code requires
"node": "^12.16.0 || >=13.2.0"
I feel like it’s fine using them.
Questions
- There still is a hard-coded list of allowed delimiters, hence that will never cover all use-cases (eg. what if my pattern contains both
"
and/
?). I feel like we could provide a sane, exported default list and also allow developers to provide their own as an escape hatch (and because we’d export the default list, they could build upon it for maximum compatibility/future-proofness).- Another option would be to support pattern-matching within the pattern-matching. It seems you considered it for the doc reads Note regex itself is not yet supported.
- Hacking on the code, I felt like the naming for some variables is subpar.
words
/word
in particular, asword
may actually contain multiple words (any pattern the developer wants to match within the code block, really). I didn’t alter that naming but I’d propose doing it in the PR.
What do you think?
There still is a hard-coded list of allowed delimiters, hence that will never cover all use-cases (eg. what if my pattern contains both " and /?).
Is it not possible to use \
to add escaping like strings when they contain '
or "
?
Hacking on the code, I felt like the naming for some variables is subpar. words/word in particular, as word may actually contain multiple words (any pattern the developer wants to match within the code block, really). I didn’t alter that naming but I’d propose doing it in the PR.
I agree with this, it was named like that as there were limitations to how it worked but it's been able to capture any string for a year now, there's an issue open: https://github.com/atomiks/rehype-pretty-code/issues/64
There still is a hard-coded list of allowed delimiters, hence that will never cover all use-cases (eg. what if my pattern contains both " and /?).
Is it not possible to use
\
to add escaping like strings when they contain'
or"
?
With the current implementation I guess not, the meta
string is dumb-parsed. Support for escaping sequences would require treating each pattern (the infamous word
variable) as a regexp itself, not as a mere string. Maybe that's the right call.
Hacking on the code, I felt like the naming for some variables is subpar. words/word in particular, as word may actually contain multiple words (any pattern the developer wants to match within the code block, really). I didn’t alter that naming but I’d propose doing it in the PR.
I agree with this, it was named like that as there were limitations to how it worked but it's been able to capture any string for a year now, there's an issue open: #64
Thanks for pointing that out.