rouge icon indicating copy to clipboard operation
rouge copied to clipboard

Proposed fix for potential ReDoS vulnerability in the sed lexer

Open yoshi389111 opened this issue 9 months ago • 0 comments

While investigating the performance issue raised in Issue #2057 regarding sed syntax highlighting, I discovered what appears to be a ReDoS-like vulnerability.

  • fix #2057

The issue seems to result from a combination of factors, and I’ve identified the following two potential problems:

  • The parsing of s and y commands appears to have a ReDoS-like vulnerability.
  • For commands with labels (such as : or t), Rouge currently requires a space between the command and the label, which is not necessary in sed.

Issue with parsing s and y commands

When s or y commands are invalid, and multiple backslashes follow them, the number of backtracking operations in the regular expression increases exponentially. I believe the cause lies in the edot regular expression defined in lib/rouge/lexers/sed.rb.

https://github.com/rouge-ruby/rouge/blob/bf007b7382070f33f49350c48502b47ee2aaab20/lib/rouge/lexers/sed.rb#L73-L79

https://github.com/rouge-ruby/rouge/blob/bf007b7382070f33f49350c48502b47ee2aaab20/lib/rouge/lexers/sed.rb#L91

The current edot is defined as /\\.|./m, where the \ character matches both sides of the or operator, causing backtracking to increase at an exponential rate when there are many backslashes (note that using the non-greedy *? does not reduce the number of backtracking operations).

If the number of backslashes is in the low 20s, it can be processed relatively quickly, but if there are more than 30, it seems to take a considerable amount of time.

A simple fix would be to change the definition of edot to /\\.|[^\\]/m. I tested this change, and it resulted in a significant reduction in processing time. Additionally, all other test cases passed successfully.

Before the fix:

$ time (echo "s/$(printf '\\a%.0s' $(seq 1 30))" | bin/rougify highlight -l sed)
s/\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a

real    1m27.914s
user    1m27.878s
sys     0m0.069s

After the fix:

$ time (echo "s/$(printf '\\a%.0s' $(seq 1 30))" | bin/rougify highlight -l sed)
s/\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a

real    0m0.422s
user    0m0.333s
sys     0m0.054s
$ time (echo "s/$(printf '\\a%.0s' $(seq 1 300))" | bin/rougify highlight -l sed)
s/\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a\a

real    0m0.440s
user    0m0.380s
sys     0m0.055s

Issue with spaces after commands with labels

In sed, spaces after commands with labels like : or t are optional. However, in Rouge, a space is currently required. As a result, in the reported issue, the : command is not correctly recognized, and the label that follows it is mistakenly treated as part of the command. This causes the lexer to incorrectly parse it as an incomplete s command, leading to excessive processing time.

https://github.com/rouge-ruby/rouge/blob/bf007b7382070f33f49350c48502b47ee2aaab20/lib/rouge/lexers/sed.rb#L124-L125

This pull request does not address the optional space after label commands.

yoshi389111 avatar Mar 22 '25 14:03 yoshi389111