marked feat: tokenizer extension position

Marked version: 15.0.6

Description

Add position property for tokenizer extension to it can pick where to be ran in the lexer.

Fixes #3590

TODO:

[ ] Check benchmark speed (initial check seems to be fine)
[ ] Write tests
[ ] Write docs

Contributor

[ ] Test(s) exist to ensure functionality and minimize regression (if no tests added, list tests covering this PR); or,
[ ] no tests required for this PR.
[ ] If submitting new feature, it has been documented in the appropriate places.

Committer

In most cases, this should be a different person than the contributor.

[ ] CI is green (no forced merge required).
[ ] Squash and Merge PR following conventional commit guidelines.

Jan 18 '25 19:01 UziTech

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
marked-website	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Apr 25, 2025 3:19am

Jan 18 '25 19:01 vercel[bot]

I haven't run any benchmarks on this, but I'm curious how much of a slowdown this adds, assuming no extensions. Do you have any measurements?

As an alternative that might be more flexible, but more restructuring... I wonder what you think about extracting each of the parser/renderer steps into an array of "extension-like objects".

I.e., instead of:

      beforeCode();
      // code
      if (token = this.tokenizer.code(src)) {
       ...
       beforeFences();

       // fences
      if (token = this.tokenizer.fences(src)) {
      ...

something like (pseudocode):

const tokenizers = {
  code : ()=>{ if (token = this.tokenizer.code(src)) ... },
  fences : ()=>{ if (token = this.tokenizer.fences(src)) ... },
  ...
}

for ([tokenizerName, tokenizerFunction] in tokenizers) {
  runExtensionBefore(tokenizerName);
  tokenizerFunction();
}

I'm fairly certain there is some speed impact calling the tokenizers from an array/map/object like this but it might make the whole project more flexible for this type of "extension position" customization.

Feb 24 '25 18:02 calculuschild

I did try to move the tokenizers to some sort of array but there were two problems.

Some tokenizers require a lot of extra logic that is not easy to move out of the lexer.
I couldn't think of an easy way to provide the array to the user so they can pick where to put their tokenizers

I think just adding a position property like this is the easiest for the user, and even though it is a lot of boilerplate we are not likely going to be adding or removing tokenizers from the lexer.

Feb 26 '25 02:02 UziTech

I did try to move the tokenizers to some sort of array but there were two problems.

Ok, fair enough. Looking back, this is what I tried to do back in 2021 and I think we concluded arrays just add too much slowdown anyway. https://github.com/markedjs/marked/pull/1872

Also looking back, we had discussed a "before" parameter 4 years ago. Hoping it wasn't also a lag issue that made us drop it https://github.com/markedjs/marked/pull/2043#issuecomment-839215243

Feb 28 '25 19:02 calculuschild

Remaining comments:

Ya, these 3 are in the TODOs section in this PRs description. I will work on them soon.

Also looking back, we had discussed a "before" parameter 4 years ago. Hoping it wasn't also a lag issue that made us drop it

Looks like we just figured we would get it done when it was actually needed

Feb 28 '25 20:02 UziTech

After months of trying different things to see if there is a way this won't allow down marked I haven't been able to find one.

The only thing I can think of doing is making this an extension that uses the provideLexer hook to use a different lexer that changes the order. That way it won't slow down default marked and someone can opt in if they are ok with the slowdown.

Jul 25 '25 06:07 UziTech