Race condition for async parsing when hooks are present
Marked version:
16.4.1
Describe the bug
When multiple parsing operations happen in parallel, with one for regular block parsing and one inline, the indication if parsing is block or inline can bleed between the calls. In my tests this happens when a hook is present.
To Reproduce Steps to reproduce the behavior:
import { Marked } from 'marked';
const marked = new Marked();
marked.use({ hooks: { postprocess: (html) => html.replaceAll('this', 'THIS') } });
console.log(
await Promise.all([
marked.parseInline('this is inline before block', { async: true }),
marked.parse('this is block after inline', { async: true }),
]),
);
console.log(
await Promise.all([
marked.parse('this is block before inline', { async: true }),
marked.parseInline('this is inline after block', { async: true }),
]),
);
In the code above, Promise.all will cause both instances in each group to run in "parallel" (notwithstanding the event loop), so the second parse operation will start before the previous one ended.
This results in:
[
'<p>THIS is inline before block</p>\n',
'<p>THIS is block after inline</p>\n'
]
[ 'THIS is block before inline', 'THIS is inline after block' ]
Note that the code had one inline and one block parsing in each group, but in both cases the inline/block parsing caused both instances to have the same inline/block setting.
Expected behavior
[ 'THIS is inline before block', '<p>THIS is block after inline</p>\n' ]
[ '<p>THIS is block before inline</p>\n', 'THIS is inline after block' ]
I just tested the same code with 16.4.2 and 17.0.0 and the problem persists.
The issue seems to be here: https://github.com/markedjs/marked/blob/23b9d0137fb260ff0033d3c56cdce386201d5fa6/src/Instance.ts#L303-L306
The opt.hooks object is shared between the different calls to parse, and the block type is set in that hooks object. Then, in:
https://github.com/markedjs/marked/blob/23b9d0137fb260ff0033d3c56cdce386201d5fa6/src/Instance.ts#L311
the provideLexer() method is called with the blockType set at the object level with both calls using the same value. A similar issue is found a few lines down with the call to provideParser().
Possible solutions here include:
- changing the signature to hooks functions to include the block type (and possibly options). This breaks backwards compatibility for any existing user that relies on
this.blockfor a custom lexer/parser (or even other hooks), which is a documented feature, so this is probably not a good idea. - cloning the hooks object in each call to
parse. A simple structured clone or spread argument is not possible because the hooks object is a class instance and structured clones don't copy methods, so a custom clone operation would need to be done. - providing a custom
thisobject to theprovideLexer()function that includes the block and options. This might not be that different than the previous option. - something else?
Nice catch! I'm curious if this was found actually doing many block and inline parses asynchronously, and why?
A simple fix would be different Marked instances for block and inline. I wonder if this should just be documented?
Otherwise I like the custom this to capture the options at the beginning.
Nice catch! I'm curious if this was found actually doing many block and inline parses asynchronously, and why?
Our use case that triggered this find is a student assessment system. We are using markdown for grader rubrics. The rubric text is inline markdown, while the extended explanation and grader instructions are block markdown.
In our case we ended up not actually needing the async, as there was no meaningful performance difference between parsing all rubrics in sequence without async or in parallel with it, and we don't have async hooks, so disabling async resolved it for us, but we're raising it here as it may affect others with similar scenarios.
A simple fix would be different Marked instances for block and inline. I wonder if this should just be documented?
Otherwise I like the custom
thisto capture the options at the beginning.
I'm happy with either option. The documentation update would probably be the least intrusive option, I think, who knows what users may be relying on... https://xkcd.com/1172/