markdig icon indicating copy to clipboard operation
markdig copied to clipboard

Depth limit exceeded on non-malicious markdown content

Open yufeih opened this issue 3 years ago • 6 comments

This file raises depth limit exceeded exception in version 0.23.0 due to #500.

Some additional info:

  • That file is an auto generated API reference page.
  • I cannot give a reasonable depth limit that don't break all content in docs.microsoft.com, but a depth limit configuration is a good start to mitigate the problem.
  • I did some perf profiling of markdig on a bunch of real markdown files used by docs.microsoft.com and the hottest function inside markdig is FindLastContainer. I don't have enough time to dig deeper, not sure if there are faster ways to find the last container (like tracking its location instead of search the tree each time?)

https://github.com/dotnet/docfx/pull/7016

yufeih avatar Jan 21 '21 07:01 yufeih

Just looking at the file, we shouldn't be approaching the current 10k limit 🤔

I will look into what is happening here this week.

MihaZupan avatar Jan 21 '21 13:01 MihaZupan

Something weird is happening with table parsing here.

Whenever we have single backticks '`' in different lines, Markdig will give up on the table and parse it into a very deep model - one nested level for each pipe character!

Example demonstrating this: https://gist.github.com/MihaZupan/98621af5788f570f0dd0b809f1f3489b Note the "BadTable" being rather small, but with 2 extra backticks - this will fail to parse as a table in Markdig. The "GoodTable", approaching but under 10k cells, will parse as a table just fine.

You can see that this is happening for the sample file you mentioned. If you browse to the rendered docs page, you will see that it's completely broken: https://docs.microsoft.com/en-us/graph/api/intune-deviceconfig-macosdevicefeaturesconfiguration-create?view=graph-rest-beta#request-body

In the sample file, the problem is caused by the standalone ` in this line of the table (left of "console"):

consoleAccessDisabled Boolean Whether the Other user will disregard use of the `>console> special user name.

So the issue for that specific file can be solved by just removing the extra backtick, but we'll have to investigate what's going on overall.

MihaZupan avatar Jan 22 '21 01:01 MihaZupan

Nice catch! I could forward to the relevant folks to see if we could escape the problematic ` in the API generation process.

GitHub seems to be rendering that table just fine, I'm not sure if this is another difference between GFM and CommonMark.

yufeih avatar Jan 22 '21 03:01 yufeih

GitHub seems to be rendering that table just fine, I'm not sure if this is another difference between GFM and CommonMark.

Yeah, it's a known issue. GitHub is not using the same logic to decode a table. I started before they did their work on CommonMark and we didn't choose the same behavior.

xoofx avatar Jan 22 '21 05:01 xoofx

Also, the code for parsing a grid table in markdig is awful to work with, I'm not really proud of it 😅 , so if anyone is interested in making an implementation that is GitHub compliant, you are welcome 😉

xoofx avatar Jan 22 '21 05:01 xoofx

I was trying to understand this a bit more as the error scenario behavior was not consistent. I think I understand the root cause by sifting through the code.

Doesnot work:

| Header 1  | Another header here | This is a long header |
| --------  | ------------------- | --------------------- |
| Some data | Some more data      | data                  | 
| data      | Some long data here | more data             | 
| Some data | Some more data      | data                  | 
| data      | Some long data here | more data             | 
| Some data | Some more data      | data                  | 
| data      | Some long data here | more data             | 
| Some data | Some more data      | data                  | 
| data      | Some long data here | more data             | 
| Some data | Some more data      | data                  | 
| data      | Some long data here | more data             | 
| Some data | Some more data      | data                  | 
| data      | Some long data here | more data             | 
| Some data | Some more data      | data                  | 
| data      | Some long data here | more data             | 
A basic table

Works:

| Header 1  | Another header here | This is a long header |
| --------  | ------------------- | --------------------- |
| Some data | Some more data      | data                  | 
| data      | Some long data here | more data             | 
| Some data | Some more data      | data                  | 
| data      | Some long data here | more data             | 
| Some data | Some more data      | data                  | 
| data      | Some long data here | more data             | 
| Some data | Some more data      | data                  | 
| data      | Some long data here | more data             | 
| Some data | Some more data      | data                  | 
| data      | Some long data here | more data             | 
| Some data | Some more data      | data                  | 
| data      | Some long data here | more data             | 
| Some data | Some more data      | data                  | 
| data      | Some long data here | more data             | 

A basic table

Please notice how i have a new line character explicitly after the table has ended. This helped the BlockParser to put the A basic table into a seperate block as opposed to including it with the table itself.

Let me know if this makes sense.

saipramod avatar Apr 07 '22 19:04 saipramod