esapi-java-legacy DefaultEncoder / getCanonicalizedURI returns mix encoding for HTML special characters

Discussed in https://github.com/ESAPI/esapi-java-legacy/discussions/823

^{Originally posted by krog78 January 19, 2024}

Hi,

DefaultEncoder / getCanonicalizedURI returns mix encoding for HTML special characters in query string (and does not seem to canonicalize the parameter value despite the fact it is mentionned):

https://github.com/ESAPI/esapi-java-legacy/blob/2136292c2225ca8ff770d85f41364873218b7ff5/src/main/java/org/owasp/esapi/reference/DefaultEncoder.java#L573

And the canonicalize is applied to scheme, host, port and also UriSegment.SCHEMSPECIFICPART, is it really relevant?

Thanks, Regards, Sylvain

Jan 22 '24 14:01 xeno6696

Hi, sorry for the response delay,

We effectively have both parameters set to false:

Encoder.AllowMultipleEncoding=false Encoder.AllowMixedEncoding=false

The URL we are using is of kind /webapp/ux/home?d=1705914006565&status=login&ticket=1705914090394_HzJpTROVfhW-JhRW0OqDbHu7tWXXlgrKSUmOzIMsZNCcUIiYGMXX_Q%3D%3D&newsess=false&roleid=DP010101/0007&origin=ourprogram, this is a relative URL but I think the problem also occurs with full URL. The following warnings are written:

22-Jan-2024 10:03:28.231 AVERTISSEMENT [http-nio-8080-exec-8] org.owasp.esapi.logging.java.JavaLogLevelHandlers.log [SECURITY FAILURE Anonymous:58505@unknown -> 0:0:0:0:0:0:0:1:8080/eTemptation/Encoder] Mixed encoding (2x) detected in /webapp/ux/home?d=1705914006565&status=login&ticket=1705914090394_HzJpTROVfhW-JhRW0OqDbHu7tWXXlgrKSUmOzIMsZNCcUIiYGMXX_Q%3D%3D&newsess=false&roleid=DP010101/0007&origin=ourprogram 22-Jan-2024 10:03:52.919 AVERTISSEMENT [http-nio-8080-exec-8] org.owasp.esapi.logging.java.JavaLogLevelHandlers.log [SECURITY FAILURE Anonymous:58505@unknown -> 0:0:0:0:0:0:0:1:8080/eTemptation/Encoder] Mixed encoding (2x) detected in d=1705914006565&status=login&ticket=1705914090394_HzJpTROVfhW-JhRW0OqDbHu7tWXXlgrKSUmOzIMsZNCcUIiYGMXX_Q%3D%3D&newsess=false&roleid=DP010101/0007&origin=ourprogram

The warning is produced when seg = SCHEMSPECIFICPART and on seg = QUERY because of line

esapi-java-legacy/src/main/java/org/owasp/esapi/reference/DefaultEncoder.java

Line 571 in 2136292 String value = canonicalize(parseMap.get(seg), allowMultiple, allowMixed); (the full line is canonicalized).

Note: the canonicalize parameters of the function are restrictMultiple and restrictMixed but we are passing allowMultiple and allowMixed is it normal?

The first HTMLEntityCodec decodes the string as:

/webapp/ux/home?d=1705914006565&status=login&ticket=1705914653964_thWhiiFp_VESwCkQ-Rq0TU0LZWVKuRxpSUmOzIMsZNCcUIiYGMXX_Q%3D%3D≠wsess=false&roleid=DP010101/0007∨igin=ourprogram

&or has been interpreted as HTML special char (is it normal? I made a test with Chrome, Firefox and Edge with the following code and none is interpreted the special character : Art and Copy).

How should we validate such URLs (containaing HTML special chars) ?

Thanks, Regards, Sylvain

Moved @Krog78's comment here.

Jan 22 '24 14:01 xeno6696

Quick notes:

Unwrapped URL as-is:

/webapp/ux/home?
d=1705914006565
&status=login
&ticket=1705914090394_HzJpTROVfhW-JhRW0OqDbHu7tWXXlgrKSUmOzIMsZNCcUIiYGMXX_Q%3D%3D
&newsess=false
&roleid=DP010101/0007
&origin=ourprogram

Percent-decoded:

/webapp/ux/home?
d=1705914006565
&status=login
&ticket=1705914090394_HzJpTROVfhW-JhRW0OqDbHu7tWXXlgrKSUmOzIMsZNCcUIiYGMXX_Q==
&newsess=false
&roleid=DP010101/0007
&origin=ourprogram

Both versions, unwrapped looking for HTML Entities results in a null finding.

Jan 22 '24 14:01 xeno6696

Found it. As discussed in #823 the first call to canonicalize the entire query string is run into the canonicalize method on line 541 and generates the false positive.

Further research is necessary to determine exactly what is being detected because sweeping the input against standard HTML decoding (NOT ESAPI) results in zero change to the output. (There's no collision, so what gives?)

Jan 22 '24 14:01 xeno6696

Not sure what to make of this one.

HTMLDecode absolutely transforms output here when it's not expected to.

Jan 22 '24 14:01 xeno6696

Issue 1: the call to canonicalize on line 541 is attempting an early canonicalize in the case of the queries. We're not supposed to touch those until we've split the queries into key/value pairs. This will be resolved by finessing the logic to placed 541 into the else block that checks to see if we're at the QUERY segment. THAT will partially mitigate the problem by ensuring the check is done at the correct location.

Issue 2: Determine why the input /webapp/ux/home?d=1705914006565&status=login&ticket=1705914090394_HzJpTROVfhW-JhRW0OqDbHu7tWXXlgrKSUmOzIMsZNCcUIiYGMXX_Q%3D%3D&newsess=false&roleid=DP010101/0007&origin=ourprogram results in a transformation to /webapp/ux/home?d=1705914006565&status=login&ticket=1705914090394_HzJpTROVfhW-JhRW0OqDbHu7tWXXlgrKSUmOzIMsZNCcUIiYGMXX_Q%3D%3D≠wsess=false&roleid=DP010101/0007∨igin=ourprogram with the microscopic view of the text being:

&newsess=false&roleid=DP010101/0007&or
into
≠wsess=false&roleid=DP010101/0007∨

It appears that I solved that by looking at this. The HTML entity Codec is translating &ne into ≠, and then the &or detection is a legitimate bug that I'm staring at right now. But at any rate, combined with the percents in the original input, that's a mixed encoding exception before we even get to the &or.

I'm stumped as to why we're translating that &or however. This is just strange.

The FP issue will be fixed easy and can go whenever the next point release goes out, but the misdetection on &or.... who knows. I think that's its own issue.

Jan 22 '24 15:01 xeno6696

esapi-java-legacy esapi-java-legacy copied to clipboard

DefaultEncoder / getCanonicalizedURI returns mix encoding for HTML special characters

Discussed in https://github.com/ESAPI/esapi-java-legacy/discussions/823

esapi-java-legacy
esapi-java-legacy copied to clipboard