java-html-sanitizer
java-html-sanitizer copied to clipboard
about PolicyFactory.and(PolicyFactory f) job
I organized the results made through PolicyFactory.and(PolicyFactory f)
What I want to know exactly : with afterPolicy = beforePolicy.and(newPolicy) how has the afterPolicy changed from beforePolicy?
1. allowWithoutAttributes disallowWithoutAttributes
we've investigated in #197
2. allowElements disallowElements
| beforePolicy | afterPolicy | result | |
|---|---|---|---|
| X | X | nothing allowed | no problem |
| X | allow | beforePolicy tag all remove, afterPolicy tag alive(only added tags) | no problem |
| X | disallow | beforePolicy tag all remove, afterPolicy tag all remove | no problem |
| allow | X | beforePolicy tag alive(only added tags), afterPolicy tag alive(only added tags) | no problem |
| allow | allow | only allowed elements are allowed. | no problem |
| allow | disallow | If I allow p tag in beforePolicy and disallow p tag in newPolicy, result afterPolicy allow p tag | I think that disallowing p tag in afterPolicy is right |
// beforePolicy : allow
// afterPolicy : disallow
@Test
public void testAllowElements6() {
PolicyFactory beforePolicy = new HtmlPolicyBuilder()
.allowElements("p")
.toFactory();
String input = "<p>pText</p>";
assertEquals("<p>pText</p>", beforePolicy.sanitize(input));
PolicyFactory newPolicy = new HtmlPolicyBuilder()
.disallowElements("p")
.toFactory();
PolicyFactory afterPolicy = beforePolicy.and(newPolicy);
assertEquals("<p>pText</p>", afterPolicy.sanitize(input));
}
| beforePolicy | afterPolicy | result | |
|---|---|---|---|
| disallow | X | beforePolicy tag remove, afterPolicy tag remove | no problem |
| disallow | allow | beforePolicy tag remove, afterPolicy tag alive(only added tags) | no problem |
| disallow | disallow | beforePolicy tag remove, afterPolicy tag remove | no problem |
3. allowUrlProtocols disallowUrlProtocols
| beforePolicy | afterPolicy | result | |
|---|---|---|---|
| X | X | nothing allowed protocols | no problem |
| X | allow | nothing allowed protocols | I think that afterPolicy allowing http protocol is right |
// before : X
// after : allow
@Test
public void testAllowUrlProtocol2() {
PolicyFactory beforePolicy = new HtmlPolicyBuilder()
.allowElements("a")
.allowAttributes("href")
.onElements("a")
.toFactory();
String input = "<a href='http://example.com'>Hi</a>";
assertEquals("Hi", beforePolicy.sanitize(input));
PolicyFactory newPolicy = new HtmlPolicyBuilder()
.allowElements("a")
.allowAttributes("href")
.onElements("a")
.allowUrlProtocols("http")
.toFactory();
PolicyFactory afterPolicy = beforePolicy.and(newPolicy);
assertEquals("Hi", afterPolicy.sanitize(input));
}
| beforePolicy | afterPolicy | result | |
|---|---|---|---|
| X | disallow | beforePolicy tag all remove, afterPolicy tag all remove | no problem |
| allow | X | If I allow http protocol in beforePolicy and do nothing in newPolicy, afterPolicy result disallow http protocol | In order to be consistent with the previous strategy, I think that afterPolicy allow http protocol is right |
// before : allow
// after : X
@Test
public void testAllowUrlProtocol4() {
PolicyFactory beforePolicy = new HtmlPolicyBuilder()
.allowElements("a")
.allowAttributes("href")
.onElements("a")
.allowUrlProtocols("http")
.toFactory();
String input = "<a href='http://example.com'>Hi</a>";
assertEquals("<a href=\"http://example.com\">Hi</a>", beforePolicy.sanitize(input));
PolicyFactory newPolicy = new HtmlPolicyBuilder()
.allowElements("a")
.allowAttributes("href")
.onElements("a")
.toFactory();
PolicyFactory afterPolicy = beforePolicy.and(newPolicy);
assertEquals("Hi", afterPolicy.sanitize(input));
}
| beforePolicy | afterPolicy | result | |
|---|---|---|---|
| allow | allow | If I allow http protocol in beforePolicy and allow https protocol in newPolicy, result afterPolicy is not allow http | In order to be consistent with the previous strategy, I think that afterPolicy allow http protocol is right |
// before : allow
// after : allow
@Test
public void testAllowUrlProtocol5_1() {
PolicyFactory beforePolicy = new HtmlPolicyBuilder()
.allowElements("a")
.allowAttributes("href")
.onElements("a")
.allowUrlProtocols("http")
.toFactory();
String input = "<a href='http://example.com'>Hi</a>";
assertEquals("<a href=\"http://example.com\">Hi</a>", beforePolicy.sanitize(input));
PolicyFactory newPolicy = new HtmlPolicyBuilder()
.allowElements("a")
.allowAttributes("href")
.onElements("a")
.allowUrlProtocols("https")
.toFactory();
PolicyFactory afterPolicy = beforePolicy.and(newPolicy);
assertEquals("Hi", afterPolicy.sanitize(input));
}
| beforePolicy | afterPolicy | result | |
|---|---|---|---|
| allow | disallow | beforePolicy(allow) and afterPolicy(disallow) | no problem |
| disallow | X | beforePolicy(disallow) and afterPolicy(disallow) | no problem |
| disallow | allow | If i disallow http protocol in beforePolicy and allow http protocol in newPolicy, result afterPolicy is not allow http | I think that afterPolicy allow http protocol is right |
// before : disallow
// after : allow
@Test
public void testAllowUrlProtocol8() {
PolicyFactory beforePolicy = new HtmlPolicyBuilder()
.allowElements("a")
.allowAttributes("href")
.onElements("a")
.disallowUrlProtocols("http")
.toFactory();
String input = "<a href='https://example.com'>Hi</a>";
assertEquals("Hi", beforePolicy.sanitize(input));
PolicyFactory newPolicy = new HtmlPolicyBuilder()
.allowElements("a")
.allowAttributes("href")
.onElements("a")
.allowUrlProtocols("http")
.toFactory();
PolicyFactory afterPolicy = beforePolicy.and(newPolicy);
assertEquals("Hi", afterPolicy.sanitize(input));
}
| beforePolicy | afterPolicy | result | |
|---|---|---|---|
| disallow | disallow | all disallow | no problem |
4. allowTextIn disallowTextIn
at first, I tried to organize it all at once, but it got complicated. I'll organize it separately in the next issue.
5. allowAttributes disallowAttributes
at first, I tried to organize it all at once, but it got complicated. I'll organize it separately in the next issue.
Etc
and I have an opinion(it's tiny). I want your feedback.
ElementAndAttributePolicies p = e.getValue();
ElementAndAttributePolicies q = f.policies.get(elName);
if (q != null) {
p = p.and(q);
}
above code, we separate each policy using p, q variables.
then we send q variable in ElementAndAttributePolicies and method.
but in ElementAndAttributePolicies and(ElementAndAttributePolicies p)
parameter name is p. It keeps p and q confused.
ElementAndAttributePolicies and(ElementAndAttributePolicies p) {
assert elementName.equals(p.elementName):
elementName + " != " + p.elementName;
...
}
How about we change the naming ElementAndAttributePolicies p -> ElementAndAttributePolicies q?
this method only use in PolicyFactory.and()
if you have better idea, I'd love to hear.
2. allowElements disallowElements case
in PolicyFactory and method,
...
ElementAndAttributePolicies p = e.getValue();
ElementAndAttributePolicies q = f.policies.get(elName);
if (q != null) {
p = p.and(q);
} else {
// Mix in any globals that are not already taken into account in this.
p = p.andGlobals(f.globalAttrPolicies);
}
...
if q is null, that's mean two cases. one is newPolicy disallow tag, the other is newPolicy doesn't set allow/disallow elements. therefore we need to separate them.
if (q != null) { // beforePolicy and newPolicy has same tag
p = p.and(q);
b.put(elName, p);
} else {
if (f.policies.isEmpty()) { // disallow or X(set nothing)
// todo we need to separate disallow case and X(set nothing)
} else {
// Mix in any globals that are not already taken into account in this.
p = p.andGlobals(f.globalAttrPolicies);
b.put(elName, p);
}
}
at first I made below code(but I think that using f.textContainers is not good way).
if (q != null) { // beforePolicy and newPolicy has same tag
p = p.and(q);
b.put(elName, p);
} else {
if (f.policies.isEmpty()) { // => disallow or X
if (f.textContainers.isEmpty()) { // that means newPolicy set nothing
p = p.andGlobals(f.globalAttrPolicies);
b.put(elName, p);
} else {
}
} else {
// Mix in any globals that are not already taken into account in this.
p = p.andGlobals(f.globalAttrPolicies);
b.put(elName, p);
}
}
any way, above code has problem. look below test code. I think only b tag is allowed is right. but I couldn't make it.
// beforePolicy : allow
// afterPolicy : disallow
@Test
public void testAllowElements6_1() {
PolicyFactory beforePolicy = new HtmlPolicyBuilder()
.allowElements("p", "b")
.toFactory();
String input = "<p>pText</p>\n<b>bText</b>\n<i>iText</i>\n<s>sText</s>";
assertEquals("<p>pText</p>\n<b>bText</b>\niText\nsText", beforePolicy.sanitize(input));
PolicyFactory newPolicy = new HtmlPolicyBuilder()
.disallowElements("p")
.toFactory();
PolicyFactory afterPolicy = beforePolicy.and(newPolicy);
assertEquals("pText\n<b>bText</b>\niText\nsText", afterPolicy.sanitize(input));
}
@mikesamuel After all, should this also change the underlying structure? I tried to work with minimal change, but it was blocked.
2. allowElements disallowElements case
I think again, The idea of changing the existing structure is in the wrong order. I tried to solve the problem by adding new fields without modifying the existing structure.
I make two fields(isNotMatchingDisallowElement and isNotCalledAllowElementsMethod).
As I explained in above comment we need to separate disallow case and X(set nothing).
so I use that fields.
code : https://github.com/yangbongsoo/java-html-sanitizer/commit/db03df04eb87e335b50fbeaa927cb5be37918750
if (q != null) {
p = p.and(q);
b.put(elName, p);
} else {
if (f.policies.isEmpty()) {
boolean isNotMatchingDisallowElement = true;
for (String eachDisallowElement : f.disallowElementsName) {
if (elName.equals(eachDisallowElement)) {
isNotMatchingDisallowElement = false;
break;
}
}
if (isNotMatchingDisallowElement || f.isNotCalledAllowElementsMethod) {
p = p.andGlobals(f.globalAttrPolicies);
b.put(elName, p);
}
} else {
// Mix in any globals that are not already taken into account in this.
p = p.andGlobals(f.globalAttrPolicies);
b.put(elName, p);
}
}
3. allowUrlProtocols disallowUrlProtocols case
in below test code, I set allowUrlprotocols(allow) to newPolicy only.
// beforePolicy : X
// newPolicy : allow
@Test
public void testAllowUrlProtocol2() {
PolicyFactory beforePolicy = new HtmlPolicyBuilder()
.allowElements("a")
.allowAttributes("href")
.onElements("a")
.toFactory();
String input = "<a href='http://example.com'>Hi</a>";
assertEquals("Hi", beforePolicy.sanitize(input));
PolicyFactory newPolicy = new HtmlPolicyBuilder()
.allowElements("a")
.allowAttributes("href")
.onElements("a")
.allowUrlProtocols("http")
.toFactory();
PolicyFactory afterPolicy = beforePolicy.and(newPolicy);
assertEquals("Hi", afterPolicy.sanitize(input));
}
so in JoinedAttributePolicy, we had two polices(FilterUrlByProtocolAttributePolicy).
the problem is here. in for loop, first AttributePolicy doesn't have allowed protocols. so value is null. and because value is null we break loop(even if next AttributePolicy has allowed protocols).
public @Nullable String apply(
String elementName, String attributeName, @Nullable String rawValue) {
String value = rawValue;
for (AttributePolicy p : policies) {
if (value == null) { break; }
value = p.apply(elementName, attributeName, value);
}
return value;
}
@mikesamuel
I think that in a = AttributePolicy.Util.join(a, b) logic, we should combine protocols.
but I am not sure we have to change that codes. I need your opinion.
// a is FilterUrlByProtocolAttributePolicy(protocols 0)
AttributePolicy a = e.getValue();
// b is FilterUrlByProtocolAttributePolicy(protocols 1)
AttributePolicy b = q.attrPolicies.get(attrName);
if (b != null) {
a = AttributePolicy.Util.join(a, b);
}
joinedAttrPolicies.put(attrName, a);
Sorry for the delay:
I think that in a = AttributePolicy.Util.join(a, b) logic, we should combine protocols.
Hmm. I wonder whether we could define
interface JoinableAttributePolicy {
/** Null if this does not know how to join with other; otherwise the joined policy. */
AttributePolicy tryJoinWith(JoinableAttributePolicy other);
}
so that AttributePolicy.Util.Join could try to let policies join themselves, and policies that wrap other policies could try to join the policy they wrap.
If I add tryJoinWith method likes below interface in AttributePolicy, StylingPolicy will affect.
/** An attribute policy that is joinable. */
static interface JoinableAttributePolicy extends AttributePolicy, Joinable<JoinableAttributePolicy> {
// Parameterized Appropriately.
/** Null if this does not know how to join with other; otherwise the joined policy. */
AttributePolicy tryJoinWith(JoinableAttributePolicy other);
}
give me a time. I should analyze StylingPolicy at first.