marked icon indicating copy to clipboard operation
marked copied to clipboard

feat: tokenizer extension position

Open UziTech opened this issue 1 year ago • 5 comments

Marked version: 15.0.6

Description

Add position property for tokenizer extension to it can pick where to be ran in the lexer.

  • Fixes #3590

TODO:

  • [ ] Check benchmark speed (initial check seems to be fine)
  • [ ] Write tests
  • [ ] Write docs

Contributor

  • [ ] Test(s) exist to ensure functionality and minimize regression (if no tests added, list tests covering this PR); or,
  • [ ] no tests required for this PR.
  • [ ] If submitting new feature, it has been documented in the appropriate places.

Committer

In most cases, this should be a different person than the contributor.

UziTech avatar Jan 18 '25 19:01 UziTech

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
marked-website ✅ Ready (Inspect) Visit Preview 💬 Add feedback Apr 25, 2025 3:19am

vercel[bot] avatar Jan 18 '25 19:01 vercel[bot]

I haven't run any benchmarks on this, but I'm curious how much of a slowdown this adds, assuming no extensions. Do you have any measurements?

As an alternative that might be more flexible, but more restructuring... I wonder what you think about extracting each of the parser/renderer steps into an array of "extension-like objects".

I.e., instead of:

      beforeCode();
      // code
      if (token = this.tokenizer.code(src)) {
       ...
       beforeFences();

       // fences
      if (token = this.tokenizer.fences(src)) {
      ... 

something like (pseudocode):

const tokenizers = {
  code : ()=>{ if (token = this.tokenizer.code(src)) ... },
  fences : ()=>{ if (token = this.tokenizer.fences(src)) ... },
  ...
}

for ([tokenizerName, tokenizerFunction] in tokenizers) {
  runExtensionBefore(tokenizerName);
  tokenizerFunction();
}

I'm fairly certain there is some speed impact calling the tokenizers from an array/map/object like this but it might make the whole project more flexible for this type of "extension position" customization.

calculuschild avatar Feb 24 '25 18:02 calculuschild

I did try to move the tokenizers to some sort of array but there were two problems.

  1. Some tokenizers require a lot of extra logic that is not easy to move out of the lexer.
  2. I couldn't think of an easy way to provide the array to the user so they can pick where to put their tokenizers

I think just adding a position property like this is the easiest for the user, and even though it is a lot of boilerplate we are not likely going to be adding or removing tokenizers from the lexer.

UziTech avatar Feb 26 '25 02:02 UziTech

I did try to move the tokenizers to some sort of array but there were two problems.

Ok, fair enough. Looking back, this is what I tried to do back in 2021 and I think we concluded arrays just add too much slowdown anyway. https://github.com/markedjs/marked/pull/1872

Also looking back, we had discussed a "before" parameter 4 years ago. Hoping it wasn't also a lag issue that made us drop it https://github.com/markedjs/marked/pull/2043#issuecomment-839215243

calculuschild avatar Feb 28 '25 19:02 calculuschild

Remaining comments:

Ya, these 3 are in the TODOs section in this PRs description. I will work on them soon.

Also looking back, we had discussed a "before" parameter 4 years ago. Hoping it wasn't also a lag issue that made us drop it

Looks like we just figured we would get it done when it was actually needed

UziTech avatar Feb 28 '25 20:02 UziTech

After months of trying different things to see if there is a way this won't allow down marked I haven't been able to find one.

The only thing I can think of doing is making this an extension that uses the provideLexer hook to use a different lexer that changes the order. That way it won't slow down default marked and someone can opt in if they are ok with the slowdown.

UziTech avatar Jul 25 '25 06:07 UziTech