antisamy
antisamy copied to clipboard
Regex named: "Paragraph", is causing "StackOverFlowError"
Hi, When provided with a lengthy content in the 'alt' attribute of the 'img' tag, "StackOverFlowError" is thrown. On debugging the issue, it seems that the regex used to validate the data inside the 'alt' tag is not properly configured and causing excessive recursions which resulted in a "StackOverFlowError".
Vulnerable Regex:
Sample Input: <img border="0" width="320" height="200" style="width:3.368in;height:2.0486in" id="id_123" src="/url/uri" alt="SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText SampleText ">
Reference: Similar Issue found in StackOverFlow Forum
@spassarop - Can you research and possibly propose a fix?
There may a ReDoS vulnerability here. Usually, you can rewrite the regex to get rid of it.
This may help, https://devina.io/redos-checker but there are others ReDoS detectors as well if you search for them.
Also if you are not doing so, specifying a max # of occurrences of the pattern may help.
-kevin
On Wed, Dec 6, 2023, 9:30 AM Dave Wichers @.***> wrote:
@spassarop https://github.com/spassarop - Can you research and possibly propose a fix?
— Reply to this email directly, view it on GitHub https://github.com/nahsra/antisamy/issues/402#issuecomment-1843000126, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAO6PGZJ3FI3VIFKUSNWVULYIB6RDAVCNFSM6AAAAABAI5CHRKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBTGAYDAMJSGY . You are receiving this because you are subscribed to this thread.Message ID: @.***>
I could reproduce extending the length of the provided sample input.
The problem lies in the "paragraph" regexp that applies to "alt" and other attributes: ([\p{L}\p{N},'\.\s\-_\(\)]|&[0-9]{2};)*
In essence, it allow for any alphanumeric character and a few more exceptions that you can see before the pipe character. On the right of the pipe it allows HTML entities like &36;. The problem is this regexp has the format (A|B)* and Java in particular does not handle it well in the recursion when the input is "large".
We could limit the number of occurrences, I guess something like {0,X}. But on the left side it's a character and on the right it's 4. A bit unbalanced, but maybe I misinterpreted.
Other solution that I think it's more comprehensive is the other one that Kevin proposed, changing the regex to allow the same characters, like: [\p{L}\p{N},'.\s\-_\(\)&;]*
The new regexp would visually lose the semantic part of allowing HTML entities, but if you mentally arrange the allowed characters you can see they are still writable. This removes the (A|B)* format and does not throw errors.
Do you agree? If you do, the change is only replacing the regexp in the four policies it's present.
Hi @spassarop is there any safe alternate regex, that we can use for the "alt" attribute.
[\p{L}\p{N},'.\s\-_\(\)&;]* using & for & when you put it in the XML of the policy file.
Also, I just checked and “paragraph” is the only regex following the (A|B)* pattern. Should be the only one to replace.
@BloodDrag0n - The fix for this will go out in the 1.7.5 release, which hopefully will be this week.
This was fixed in the 1.7.5 release.