java-html-sanitizer
java-html-sanitizer copied to clipboard
New methods for excluding elements with specific missing or empty attributes
We recently came across a problem where an img-element with missing src-attribute caused our PDF engine to break. As the current version of the html sanitizer only contains a method for disallowing elements without any attributes, we suggest adding methods to target a specific missing/empty attribute. As a convenience we also included a method to check for a non-matching regex pattern.
Thank you very, very much for submitting a patch with your report. More from Mike soon!
Aloha,
Jim Manico @Manicode
On Oct 9, 2015, at 4:53 PM, FORUM Gesellschaft für Informationssicherheit mbH [email protected] wrote:
We recently came across a problem where an img-element with missing src-attribute caused our PDF engine to break. As the current version of the html sanitizer only contains a method for disallowing elements without any attributes, we suggest adding methods to target a specific missing/empty attribute. As a convenience we also included a method to check for a non-matching regex pattern.
You can view, comment on, or merge this pull request online at:
https://github.com/OWASP/java-html-sanitizer/pull/45
Commit Summary
+Added HtmlPolicyBuilder methods for excluding elements with specific missing or emtpy attributes Reverted changes added methods for excluding elements with specific empty or missing attributes or elements that do NOT match a pattern File Changes
M src/main/java/org/owasp/html/HtmlPolicyBuilder.java (49) Patch Links:
https://github.com/OWASP/java-html-sanitizer/pull/45.patch https://github.com/OWASP/java-html-sanitizer/pull/45.diff — Reply to this email directly or view it on GitHub.
This change seems to be missing any tests. Could you at least check that it works with the kind of policy you want to work.
We recently came across a problem where an img-element with missing src-attribute caused our PDF engine to break. As the current version of the html sanitizer only contains a method for disallowing elements without any attributes, we suggest adding methods to target a specific missing/empty attribute. As a convenience we also included a method to check for a non-matching regex pattern.
What PDF engine is this? If you don't maintain it then it might be worth filing a bug with them as well.
Should this change also require src="..." on <img> in Sanitizers.IMAGES?
Hello Mike,
sorry for the wait.
Our goal was indeed to be able to require certain attributes when their element would make no sense or as in this case even cause harm without them, which was not possible with the current allowAttributes()-implementation.
Your proposed mimic of a chainable required() seems indeed a cleaner approach. Yet my understanding of the project code is currently not sufficient to change the implementation to that within reasonable effort. I will however try and provide a test case to show the validity of the current implementation shortly. Maybe then you could refactor it to the required()-mimic as you find the time.
We use Apache FOP to export PDFs from our web application.
As you mention, it might actually be useful to require the attribute in the standard sanitizer so others do not run into that kind of issue unsuspectingly. At least no valid use case comes to mind for an img without a src that would forbid it from being required.
Best regards, Sebastian
Hello Mike,
I added a TestCase for the disallowWithoutAttribute() functionality that represents the constellation that caused us problems (arbitrary alt, missing src).
Best regards, Sebastian
It looks like this is mostly resolved. @mikesamuel - can we merge in the test case or resolve this pull request somehow? It does not look like major work is needed to resolve this..
Bump @mikesamuel ?
Sorry for ghosting. I'll take a look shortly.
{dis,}allowWithoutAttributes were reworked recently. https://github.com/OWASP/java-html-sanitizer/blob/main/change_log.md says
Release 20200615.1
- Change .and when combining two policies to respect explicit skipIfEmpty decisions.
which affected
https://github.com/OWASP/java-html-sanitizer/blob/ca406970b74abd03ec045713b3c9f53963b716ff/src/main/java/org/owasp/html/HtmlPolicyBuilder.java#L314-L343
IIUC, what you need is a way to drop specific elements that are missing some core attributes.
The following passes for me, which uses an ElementPolicy to reject <img> tags that don't have "src" in a key position. It may spuriously drop <img alt="src" src="foo.png"> but works as a POC.
String input = "<img src=foo.png><img alt=bar>";
PolicyFactory imgOkWithSrc = new HtmlPolicyBuilder()
.allowElements(
((elementName, attrs) -> (attrs.indexOf("src") & 1) == 0 ? elementName : null),
"img"
)
.allowAttributes("alt", "src").onElements("img")
.toFactory();
assertEquals("<img src=\"foo.png\" />", imgOkWithSrc.sanitize(input));