markdig
markdig copied to clipboard
Depth limit exceeded on non-malicious markdown content
This file raises depth limit exceeded exception in version 0.23.0 due to #500.
Some additional info:
- That file is an auto generated API reference page.
- I cannot give a reasonable depth limit that don't break all content in docs.microsoft.com, but a depth limit configuration is a good start to mitigate the problem.
- I did some perf profiling of markdig on a bunch of real markdown files used by docs.microsoft.com and the hottest function inside markdig is
FindLastContainer
. I don't have enough time to dig deeper, not sure if there are faster ways to find the last container (like tracking its location instead of search the tree each time?)
https://github.com/dotnet/docfx/pull/7016
Just looking at the file, we shouldn't be approaching the current 10k limit 🤔
I will look into what is happening here this week.
Something weird is happening with table parsing here.
Whenever we have single backticks '`' in different lines, Markdig will give up on the table and parse it into a very deep model - one nested level for each pipe character!
Example demonstrating this: https://gist.github.com/MihaZupan/98621af5788f570f0dd0b809f1f3489b Note the "BadTable" being rather small, but with 2 extra backticks - this will fail to parse as a table in Markdig. The "GoodTable", approaching but under 10k cells, will parse as a table just fine.
You can see that this is happening for the sample file you mentioned. If you browse to the rendered docs page, you will see that it's completely broken: https://docs.microsoft.com/en-us/graph/api/intune-deviceconfig-macosdevicefeaturesconfiguration-create?view=graph-rest-beta#request-body
In the sample file, the problem is caused by the standalone ` in this line of the table (left of "console"):
consoleAccessDisabled | Boolean | Whether the Other user will disregard use of the `>console> special user name. |
---|
So the issue for that specific file can be solved by just removing the extra backtick, but we'll have to investigate what's going on overall.
Nice catch! I could forward to the relevant folks to see if we could escape the problematic ` in the API generation process.
GitHub seems to be rendering that table just fine, I'm not sure if this is another difference between GFM and CommonMark.
GitHub seems to be rendering that table just fine, I'm not sure if this is another difference between GFM and CommonMark.
Yeah, it's a known issue. GitHub is not using the same logic to decode a table. I started before they did their work on CommonMark and we didn't choose the same behavior.
Also, the code for parsing a grid table in markdig is awful to work with, I'm not really proud of it 😅 , so if anyone is interested in making an implementation that is GitHub compliant, you are welcome 😉
I was trying to understand this a bit more as the error scenario behavior was not consistent. I think I understand the root cause by sifting through the code.
Doesnot work:
| Header 1 | Another header here | This is a long header |
| -------- | ------------------- | --------------------- |
| Some data | Some more data | data |
| data | Some long data here | more data |
| Some data | Some more data | data |
| data | Some long data here | more data |
| Some data | Some more data | data |
| data | Some long data here | more data |
| Some data | Some more data | data |
| data | Some long data here | more data |
| Some data | Some more data | data |
| data | Some long data here | more data |
| Some data | Some more data | data |
| data | Some long data here | more data |
| Some data | Some more data | data |
| data | Some long data here | more data |
A basic table
Works:
| Header 1 | Another header here | This is a long header |
| -------- | ------------------- | --------------------- |
| Some data | Some more data | data |
| data | Some long data here | more data |
| Some data | Some more data | data |
| data | Some long data here | more data |
| Some data | Some more data | data |
| data | Some long data here | more data |
| Some data | Some more data | data |
| data | Some long data here | more data |
| Some data | Some more data | data |
| data | Some long data here | more data |
| Some data | Some more data | data |
| data | Some long data here | more data |
| Some data | Some more data | data |
| data | Some long data here | more data |
A basic table
Please notice how i have a new line character explicitly after the table has ended. This helped the BlockParser to put the A basic table
into a seperate block as opposed to including it with the table itself.
Let me know if this makes sense.