kramdown
kramdown copied to clipboard
Kramdown freezes with specific input.
Hello. We have noticed that kramdown has hard time parsing some string.
Example:
text = "___QWE ___qwe ___qwe ___qwe ___qwe ___QWE ___qwe ___qwe ___qwe ___qwe ___QWE ___qwe ___qwe ___qwe ___qwe ___QWE ___qwe ___qwe ___qwe ___qwe ___QWE ___qwe ___qwe ___qwe ___qwe ___QWE ___qwe ___qwe ___qwe ___qwe ___QWE ___qwe ___qwe ___qwe ___qwe ___QWE ___qwe ___qwe ___qwe ___qwe ___QWE ___qwe ___qwe ___qwe ___qwe ___QWE ___qwe ___qwe ___qwe ___qwe ___QWE ___qwe ___qwe ___qwe ___qwe ___QWE ___qwe ___qwe ___qwe ___qwe ___QWE ___qwe ___qwe ___qwe ___qwe ___QWE ___qwe ___qwe ___qwe ___qwe ___QWE ___qwe ___qwe ___qwe ___qwe ___QWE ___qwe ___qwe ___qwe ___qwe ___QWE ___qwe ___qwe ___qwe ___qwe ___QWE ___qwe ___qwe ___qwe ___qwe ___QWE ___qwe ___qwe ___qwe ___qwe ___QWE ___qwe ___qwe ___qwe ___qwe"
Kramdown::Document.new(text)
takes around 30 second on my machine. And if I double string with same pattern - it never finishes. Is there a workaround or any option which would help processing this markdown?
Kramdown version - 2.4.0
The reason for this is backtracking. For the first three underscores kramdown tries to find the matching pair. When encountering the second three underscores, it sees that those are not the matching ones. So it starts with those three to find their matching counterpart. And on, and on, and on it goes... until the end of the text where we see that nothing matches. So all nested matching attempts are abandoned and other things tried. Until we come to the first three underscores again. Then we see that they should be handled as normal text. After that we find the second three underscores and everything starts again...
I will have a look at the code.
When encountering the second three underscores, it sees that those are not the matching ones
Why second three underscores are considered not "matching"? Is it intentional?
I have tried https://markdowntohtml.com/ for this text, and it considers them as matching if I understand correctly.
So If I understand correctly
___1 ___QWE ___2 ___
In this example 1
and 2
should be formatted.
Underscores only match at word boundaries and the stopping delimiter must not be preceded by a space (see https://kramdown.gettalong.org/syntax.html#emphasis). So yes, that is intentional.