feat: tokenizer extension position
Marked version: 15.0.6
Description
Add position property for tokenizer extension to it can pick where to be ran in the lexer.
- Fixes #3590
TODO:
- [ ] Check benchmark speed (initial check seems to be fine)
- [ ] Write tests
- [ ] Write docs
Contributor
- [ ] Test(s) exist to ensure functionality and minimize regression (if no tests added, list tests covering this PR); or,
- [ ] no tests required for this PR.
- [ ] If submitting new feature, it has been documented in the appropriate places.
Committer
In most cases, this should be a different person than the contributor.
- [ ] CI is green (no forced merge required).
- [ ] Squash and Merge PR following conventional commit guidelines.
The latest updates on your projects. Learn more about Vercel for Git ↗︎
| Name | Status | Preview | Comments | Updated (UTC) |
|---|---|---|---|---|
| marked-website | ✅ Ready (Inspect) | Visit Preview | 💬 Add feedback | Apr 25, 2025 3:19am |
I haven't run any benchmarks on this, but I'm curious how much of a slowdown this adds, assuming no extensions. Do you have any measurements?
As an alternative that might be more flexible, but more restructuring... I wonder what you think about extracting each of the parser/renderer steps into an array of "extension-like objects".
I.e., instead of:
beforeCode();
// code
if (token = this.tokenizer.code(src)) {
...
beforeFences();
// fences
if (token = this.tokenizer.fences(src)) {
...
something like (pseudocode):
const tokenizers = {
code : ()=>{ if (token = this.tokenizer.code(src)) ... },
fences : ()=>{ if (token = this.tokenizer.fences(src)) ... },
...
}
for ([tokenizerName, tokenizerFunction] in tokenizers) {
runExtensionBefore(tokenizerName);
tokenizerFunction();
}
I'm fairly certain there is some speed impact calling the tokenizers from an array/map/object like this but it might make the whole project more flexible for this type of "extension position" customization.
I did try to move the tokenizers to some sort of array but there were two problems.
- Some tokenizers require a lot of extra logic that is not easy to move out of the lexer.
- I couldn't think of an easy way to provide the array to the user so they can pick where to put their tokenizers
I think just adding a position property like this is the easiest for the user, and even though it is a lot of boilerplate we are not likely going to be adding or removing tokenizers from the lexer.
I did try to move the tokenizers to some sort of array but there were two problems.
Ok, fair enough. Looking back, this is what I tried to do back in 2021 and I think we concluded arrays just add too much slowdown anyway. https://github.com/markedjs/marked/pull/1872
Also looking back, we had discussed a "before" parameter 4 years ago. Hoping it wasn't also a lag issue that made us drop it https://github.com/markedjs/marked/pull/2043#issuecomment-839215243
Remaining comments:
Ya, these 3 are in the TODOs section in this PRs description. I will work on them soon.
Also looking back, we had discussed a "before" parameter 4 years ago. Hoping it wasn't also a lag issue that made us drop it
Looks like we just figured we would get it done when it was actually needed
After months of trying different things to see if there is a way this won't allow down marked I haven't been able to find one.
The only thing I can think of doing is making this an extension that uses the provideLexer hook to use a different lexer that changes the order. That way it won't slow down default marked and someone can opt in if they are ok with the slowdown.