marked icon indicating copy to clipboard operation
marked copied to clipboard

Parsing ~~~~~~… takes quadratic time

Open andersk opened this issue 5 years ago • 5 comments

Describe the bug

Marked takes quadratic time to parse ~~~~~~….

To Reproduce

Steps to reproduce the behavior:

$ time node -e 'require("./lib/marked")("~".repeat(25000))'
0.94user 0.01system 0:00.96elapsed 99%CPU (0avgtext+0avgdata 32336maxresident)k
0inputs+0outputs (0major+2601minor)pagefaults 0swaps
$ time node -e 'require("./lib/marked")("~".repeat(50000))'
3.39user 0.01system 0:03.42elapsed 99%CPU (0avgtext+0avgdata 32112maxresident)k
0inputs+0outputs (0major+2599minor)pagefaults 0swaps
$ time node -e 'require("./lib/marked")("~".repeat(100000))'
13.50user 0.01system 0:13.55elapsed 99%CPU (0avgtext+0avgdata 32200maxresident)k
0inputs+0outputs (0major+2599minor)pagefaults 0swaps

Observe that the time quadruples each time the input size doubles. This could be exploited for denial of service in an application running Marked on the server side.

Expected behavior

A Markdown parser (especially one that claims “built for speed” as its primary selling point) should run in linear time on all inputs. This isn’t easy but it should be possible; several parsers including the reference C implementation seem to do a good job at this.

andersk avatar Apr 08 '19 20:04 andersk

quadruples each time the input size doubles

isn't that still considered linear time? Quadratic would be if it was n^2 right?

UziTech avatar Apr 09 '19 00:04 UziTech

Linear time, O(n), doubles each time the input size doubles. Quadratic time, O(n^2), quadruples each time the input size doubles. This is quadratic.

andersk avatar Apr 09 '19 01:04 andersk

You are right, I was thinking of exponential (2^n)

UziTech avatar Apr 09 '19 15:04 UziTech

Now that #1464 has fixed the recursion error on >>>>>>…, we can see that this is also a quadratic time case.

$ time node -e 'require("./lib/marked")(">".repeat(25000))'
0.62user 0.01system 0:00.63elapsed 101%CPU (0avgtext+0avgdata 39592maxresident)k
0inputs+0outputs (0major+4521minor)pagefaults 0swaps
$ time node -e 'require("./lib/marked")(">".repeat(50000))'
2.18user 0.02system 0:02.17elapsed 101%CPU (0avgtext+0avgdata 46456maxresident)k
0inputs+0outputs (0major+6252minor)pagefaults 0swaps
$ time node -e 'require("./lib/marked")(">".repeat(100000))'
8.31user 0.03system 0:08.28elapsed 100%CPU (0avgtext+0avgdata 64860maxresident)k
0inputs+0outputs (0major+10944minor)pagefaults 0swaps

andersk avatar May 03 '19 23:05 andersk

With GFM disabled, I still see the slowdown, so it doesn't seem to be related to strikeout syntax.

Actually... it looks like the slowdown also occurs with backticks? Must be tied to code fences in some way.

calculuschild avatar Dec 13 '21 21:12 calculuschild

Closing this as it is unlikely to occur in regular markdown and if parsing untrusted markdown ReDoS can be solved by using a worker.

UziTech avatar Aug 10 '23 05:08 UziTech

If you put yourself in the shoes of someone developing an app using a “⚡ built for speed” Markdown library to parse user Markdown, how should they know that they need to build external countermeasures against an undocumented performance bug like this?

andersk avatar Aug 10 '23 06:08 andersk

ReDoS is a well documented vulnerability. Using workers to prevent it is also well documented. I don't think a user needs to know about this vulnerability to know the issues with parsing untrusted markdown.

UziTech avatar Aug 10 '23 06:08 UziTech

Further more it is possible to cause a sql injection vulnerability by passing untrusted input to sql, but that isn't a vulnerability in sql that is a vulnerability in the implementation. Same with this vulnerability. If the user parses untrusted markdown with marked or any other markdown parser it is not a vulnerability in the parser but in the implementation.

UziTech avatar Aug 10 '23 06:08 UziTech

ReDoS is a well documented vulnerability that applies to certain poorly-written regexps. Why should a user expect that a “⚡ built for speed” Markdown parsing library is internally built with regexps that are vulnerable to ReDoS?

There are other Markdown parsing libraries that do not have this bug. In principle, it should not be fundamentally unsafe to parse untrusted Markdown. If your position is that it’s unsafe to parse untrusted Markdown with this library, then then you should document that. (You currently only document that it’s unsafe to use the output HTML.)

andersk avatar Aug 10 '23 06:08 andersk

I know of at least 10 ways to DoS markdown-it and commonmark,.js neither of which use regexps the way marked does. DoS is not only a vulnerability for regexps but any parser.

If you would like to update the docs PRs are always welcome 😁👍

UziTech avatar Aug 10 '23 06:08 UziTech

In the past I’ve reported 7 quadratic-time bugs in markdown-it and 1 in commonmark.js; they were all addressed very promptly with code fixes. If you know of further issues you should certainly report them.

andersk avatar Aug 10 '23 07:08 andersk

They have all been reported that is how I know about them. It doesn't take quadratic time to cause a DoS.

So I think we can agree that any parser can have vulnerabilities and it is not ok to run untrusted code in a way that can cause DoS. Certainly there were vulnerabilities before they fixed the ones you reported. And there are certainly vulnerabilities before the next one is found and fixed.

So the solution to this problem should be to educate users on how to mitigate that. I think we have done a good job illustrating that workaround in our docs. If you don't agree you are always allowed to create a PR to update the docs.

UziTech avatar Aug 11 '23 01:08 UziTech