esapi-java-legacy
esapi-java-legacy copied to clipboard
DefaultEncoder / getCanonicalizedURI returns mix encoding for HTML special characters
Discussed in https://github.com/ESAPI/esapi-java-legacy/discussions/823
Originally posted by krog78 January 19, 2024
Hi,
DefaultEncoder / getCanonicalizedURI returns mix encoding for HTML special characters in query string (and does not seem to canonicalize the parameter value despite the fact it is mentionned):
https://github.com/ESAPI/esapi-java-legacy/blob/2136292c2225ca8ff770d85f41364873218b7ff5/src/main/java/org/owasp/esapi/reference/DefaultEncoder.java#L573
And the canonicalize is applied to scheme, host, port and also UriSegment.SCHEMSPECIFICPART, is it really relevant?
Thanks, Regards, Sylvain
Hi, sorry for the response delay,
We effectively have both parameters set to false:
Encoder.AllowMultipleEncoding=false Encoder.AllowMixedEncoding=false
The URL we are using is of kind /webapp/ux/home?d=1705914006565&status=login&ticket=1705914090394_HzJpTROVfhW-JhRW0OqDbHu7tWXXlgrKSUmOzIMsZNCcUIiYGMXX_Q%3D%3D&newsess=false&roleid=DP010101/0007&origin=ourprogram, this is a relative URL but I think the problem also occurs with full URL. The following warnings are written:
22-Jan-2024 10:03:28.231 AVERTISSEMENT [http-nio-8080-exec-8] org.owasp.esapi.logging.java.JavaLogLevelHandlers.log [SECURITY FAILURE Anonymous:58505@unknown -> 0:0:0:0:0:0:0:1:8080/eTemptation/Encoder] Mixed encoding (2x) detected in /webapp/ux/home?d=1705914006565&status=login&ticket=1705914090394_HzJpTROVfhW-JhRW0OqDbHu7tWXXlgrKSUmOzIMsZNCcUIiYGMXX_Q%3D%3D&newsess=false&roleid=DP010101/0007&origin=ourprogram 22-Jan-2024 10:03:52.919 AVERTISSEMENT [http-nio-8080-exec-8] org.owasp.esapi.logging.java.JavaLogLevelHandlers.log [SECURITY FAILURE Anonymous:58505@unknown -> 0:0:0:0:0:0:0:1:8080/eTemptation/Encoder] Mixed encoding (2x) detected in d=1705914006565&status=login&ticket=1705914090394_HzJpTROVfhW-JhRW0OqDbHu7tWXXlgrKSUmOzIMsZNCcUIiYGMXX_Q%3D%3D&newsess=false&roleid=DP010101/0007&origin=ourprogram
The warning is produced when seg = SCHEMSPECIFICPART and on seg = QUERY because of line
esapi-java-legacy/src/main/java/org/owasp/esapi/reference/DefaultEncoder.java
Line 571 in 2136292 String value = canonicalize(parseMap.get(seg), allowMultiple, allowMixed); (the full line is canonicalized).
Note: the canonicalize parameters of the function are restrictMultiple and restrictMixed but we are passing allowMultiple and allowMixed is it normal?
The first HTMLEntityCodec decodes the string as:
/webapp/ux/home?d=1705914006565&status=login&ticket=1705914653964_thWhiiFp_VESwCkQ-Rq0TU0LZWVKuRxpSUmOzIMsZNCcUIiYGMXX_Q%3D%3D≠wsess=false&roleid=DP010101/0007∨igin=ourprogram
&or has been interpreted as HTML special char (is it normal? I made a test with Chrome, Firefox and Edge with the following code and none is interpreted the special character : Art and Copy).
How should we validate such URLs (containaing HTML special chars) ?
Thanks, Regards, Sylvain
Moved @Krog78's comment here.
Quick notes:
Unwrapped URL as-is:
/webapp/ux/home?
d=1705914006565
&status=login
&ticket=1705914090394_HzJpTROVfhW-JhRW0OqDbHu7tWXXlgrKSUmOzIMsZNCcUIiYGMXX_Q%3D%3D
&newsess=false
&roleid=DP010101/0007
&origin=ourprogram
Percent-decoded:
/webapp/ux/home?
d=1705914006565
&status=login
&ticket=1705914090394_HzJpTROVfhW-JhRW0OqDbHu7tWXXlgrKSUmOzIMsZNCcUIiYGMXX_Q==
&newsess=false
&roleid=DP010101/0007
&origin=ourprogram
Both versions, unwrapped looking for HTML Entities results in a null finding.
Found it. As discussed in #823 the first call to canonicalize the entire query string is run into the canonicalize method on line 541 and generates the false positive.
Further research is necessary to determine exactly what is being detected because sweeping the input against standard HTML decoding (NOT ESAPI) results in zero change to the output. (There's no collision, so what gives?)
Not sure what to make of this one.
HTMLDecode absolutely transforms output here when it's not expected to.
Issue 1: the call to canonicalize on line 541 is attempting an early canonicalize in the case of the queries. We're not supposed to touch those until we've split the queries into key/value pairs. This will be resolved by finessing the logic to placed 541 into the else block that checks to see if we're at the QUERY segment. THAT will partially mitigate the problem by ensuring the check is done at the correct location.
Issue 2: Determine why the input /webapp/ux/home?d=1705914006565&status=login&ticket=1705914090394_HzJpTROVfhW-JhRW0OqDbHu7tWXXlgrKSUmOzIMsZNCcUIiYGMXX_Q%3D%3D&newsess=false&roleid=DP010101/0007&origin=ourprogram results in a transformation to /webapp/ux/home?d=1705914006565&status=login&ticket=1705914090394_HzJpTROVfhW-JhRW0OqDbHu7tWXXlgrKSUmOzIMsZNCcUIiYGMXX_Q%3D%3D≠wsess=false&roleid=DP010101/0007∨igin=ourprogram with the microscopic view of the text being:
&newsess=false&roleid=DP010101/0007&or
into
≠wsess=false&roleid=DP010101/0007∨
It appears that I solved that by looking at this. The HTML entity Codec is translating &ne into ≠, and then the &or detection is a legitimate bug that I'm staring at right now. But at any rate, combined with the percents in the original input, that's a mixed encoding exception before we even get to the &or.
I'm stumped as to why we're translating that &or however. This is just strange.
The FP issue will be fixed easy and can go whenever the next point release goes out, but the misdetection on &or.... who knows. I think that's its own issue.