java-html-sanitizer icon indicating copy to clipboard operation
java-html-sanitizer copied to clipboard

<img> srcset attribute encoding spaces before width descriptor to %20

Open pickle-weasle opened this issue 8 years ago • 4 comments

From looking through some HTML Reference, I can see that the srcset attribute of <img> element should be able to accept:

A list of one or more strings separated by commas indicating a set of possible image sources for the user agent to use. Each string is composed of:

a URL to an image,
**optionally, whitespace followed by one of:**
    a width descriptor, or a positive integer directly followed by 'w'. The width descriptor is divided by the source size given in the sizes attribute to calculate the effective pixel density.
    a pixel density descriptor, which is a positive floating point number directly followed by 'x'.

An example of this could be srcset="https://developer.cdn.mozilla.net/static/img/beast-404.ce38fcf80386.png 1000w".

I recently updated my Java HTML Sanitizer release version and noticed that there is now a bug. The sanitizer assumes the space and width are part of the url and so encodes them like so: srcset="https://developer.cdn.mozilla.net/static/img/beast-404.ce38fcf80386.png%201000w"

I'm guessing this is related somehow: https://github.com/OWASP/java-html-sanitizer/issues/20

pickle-weasle avatar Apr 18 '17 12:04 pickle-weasle

Yep. Will add handling for groups of URLs.

mikesamuel avatar May 12 '17 15:05 mikesamuel

I am facing same issue, when I upgraded OWASP-html-sanitizer.jar to latest. I have verified that this issue has been introduced in 20160614.1 release(it was working in 20160526.1 release).

Here is my program:

public class URLSanitizationTest {

	static void sanitizeURL(String url){
		
		// Set up the rules for the sanitizer.
			PolicyFactory pf = new HtmlPolicyBuilder().allowUrlProtocols("http", "https", "file").allowElements("img")
					.allowAttributes("src").onElements("img").toFactory();
		
		// The sanitizer works better when given a complete tag.
		final String prefix = "<img src=\"";
		final String suffix = "\" />";

		StringBuilder input = new StringBuilder();
		input.append(prefix);
		input.append(url);
		input.append(suffix);
		
		// Sanitize.
		String output = pf.sanitize(input.toString());
		
		System.out.println("Sanitized URL:"+output);
	}
	public static void main(String []args){
		
		String url = "http://www.mks.com/image s/en/logob.gif onload=\"alert('hi')\"@url\"";
		
		sanitizeURL(url);
	}
}

Output before 20160614.1 release: Sanitized URL:<img src="http://www.mks.com/image s/en/logob.gif onload&#61;" />

Output since 20160614.1 release: Sanitized URL:<img src="http://www.mks.com/image%20s/en/logob.gif%20onload&#61;" />

I am not sure whether this is expected behavior(if yes, Why?) or an issue.

Kish-Jadhav avatar Jan 12 '18 14:01 Kish-Jadhav

Image src before 20160614.1 release:"http://www.mks.com/image s/en/logob.gif onload=" Image src after 20160614.1 release:"http://www.mks.com/image%20s/en/logob.gif%20onload=" In first output consider the & with # 61 is there.

Kish-Jadhav avatar Jan 12 '18 14:01 Kish-Jadhav

Can this issue be closed now?

csware avatar Jan 31 '24 09:01 csware