webarchive-commons icon indicating copy to clipboard operation
webarchive-commons copied to clipboard

Prevent from stackoverflow by limiting length of matched pattern

Open sebastian-nagel opened this issue 6 years ago • 1 comments

The pattern used to match CSS-embedded URLs is not limited, i.e. it matches URLs of any length, potentially causing a Java stack overflow (see commoncrawl/ia-web-commons#12).

This PR fixes the issue and adds a unit test to make it reproducible resp. verify the solution.

sebastian-nagel avatar Oct 22 '19 14:10 sebastian-nagel

Looks like this patch also disallows whitespace within the URL? Under the old pattern url('foo bar') matched but with the new pattern it does not match. According to MDN's documentation whitespace should be allowed if the URL is quoted:

Quotes are required if the URL includes parentheses, whitespace, or quotes, unless these characters are escaped, or if the address includes control characters above 0x7e .

ato avatar Oct 25 '19 02:10 ato