Proposed fix for potential ReDoS vulnerability in the sed lexer
While investigating the performance issue raised in Issue #2057 regarding sed syntax highlighting, I discovered what appears to be a ReDoS-like vulnerability.
- fix #2057
The issue seems to result from a combination of factors, and I’ve identified the following two potential problems:
- The parsing of
sandycommands appears to have a ReDoS-like vulnerability. - For commands with labels (such as
:ort), Rouge currently requires a space between the command and the label, which is not necessary in sed.
Issue with parsing s and y commands
When s or y commands are invalid, and multiple backslashes follow them, the number of backtracking operations in the regular expression increases exponentially. I believe the cause lies in the edot regular expression defined in lib/rouge/lexers/sed.rb.
https://github.com/rouge-ruby/rouge/blob/bf007b7382070f33f49350c48502b47ee2aaab20/lib/rouge/lexers/sed.rb#L73-L79
https://github.com/rouge-ruby/rouge/blob/bf007b7382070f33f49350c48502b47ee2aaab20/lib/rouge/lexers/sed.rb#L91
The current edot is defined as /\\.|./m, where the \ character matches both sides of the or operator, causing backtracking to increase at an exponential rate when there are many backslashes (note that using the non-greedy *? does not reduce the number of backtracking operations).
If the number of backslashes is in the low 20s, it can be processed relatively quickly, but if there are more than 30, it seems to take a considerable amount of time.
A simple fix would be to change the definition of edot to /\\.|[^\\]/m. I tested this change, and it resulted in a significant reduction in processing time. Additionally, all other test cases passed successfully.
Before the fix:
$ time (echo "s/$(printf '\\a%.0s' $(seq 1 30))" | bin/rougify highlight -l sed)
s/\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a
real 1m27.914s
user 1m27.878s
sys 0m0.069s
After the fix:
$ time (echo "s/$(printf '\\a%.0s' $(seq 1 30))" | bin/rougify highlight -l sed)
s/\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a
real 0m0.422s
user 0m0.333s
sys 0m0.054s
$ time (echo "s/$(printf '\\a%.0s' $(seq 1 300))" | bin/rougify highlight -l sed)
s/\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a
real 0m0.440s
user 0m0.380s
sys 0m0.055s
Issue with spaces after commands with labels
In sed, spaces after commands with labels like : or t are optional. However, in Rouge, a space is currently required. As a result, in the reported issue, the : command is not correctly recognized, and the label that follows it is mistakenly treated as part of the command. This causes the lexer to incorrectly parse it as an incomplete s command, leading to excessive processing time.
https://github.com/rouge-ruby/rouge/blob/bf007b7382070f33f49350c48502b47ee2aaab20/lib/rouge/lexers/sed.rb#L124-L125
This pull request does not address the optional space after label commands.