esapi-java-legacy
esapi-java-legacy copied to clipboard
PercentCodec Doesn't Handle UTF-8 percent encoding
//Place in EncoderTest
public void testESAPIPercentEncoding() {
String input = "%E2%84%A2";
String expected = "™";
Encoder e = ESAPI.encoder();
assertEquals(expected, e.canonicalize(input));
}
It seems that the uri encoder is defaulting to a non-UTF-8 decoding, and I don't see anyway to control this behavior when searching ESAPI.properties
(This migrated from an issue I accidently opened on the newer 3.0 project.)
@kwwall -- I think to solve this problem, the only solution would be to create a fork of the PercentCodec that is specialized to UTF-8, and then have users specify the UTF-8 codec on demand.
The only problem is that UTF-8 would probably be the better one to use by default since it's a web standard.
Thoughts?
Why not make a new class, say DefaultUTF8Encoder. If people want to use that one, then can just specify something like:
ESAPI.Encoder=org.owasp.esapi.reference.DefaultUTF8Encoder
in their ESAPI.properties file. That's more or less how it's designed to work. Also, if you change the CTOR(s) in DefaultEncoder from private to protected, you probably can subclass DefaultEncoder and inherit a lot of its desired behavior.
Make sense? If not, it's probably because I'm misunderstanding something.
Note that also if you think it makes more sense to make the UTF8 encoder the default one, we can change the ESAPI.properties files to reflect that as well.
-kevin
On Mon, Jul 17, 2017 at 1:04 PM, Matt Seil [email protected] wrote:
@kwwall https://github.com/kwwall -- I think to solve this problem, the only solution would be to create a fork of the PercentCodec that is specialized to UTF-8, and then have users specify the UTF-8 codec on demand.
The only problem is that UTF-8 would probably be the better one to use by default since it's a web standard.
Thoughts?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ESAPI/esapi-java-legacy/issues/377#issuecomment-315814214, or mute the thread https://github.com/notifications/unsubscribe-auth/AB3nmwlx3UxMGNNDWmdgU3V2RYdYAl3Zks5sO5JzgaJpZM4IjW-r .
-- Blog: http://off-the-wall-security.blogspot.com/ | Twitter: @KevinWWall NSA: All your crypto bit are belong to us.
My concern about UTF-8 would be how many people would complain on the upgrade.
That said, it would help spread the word that folks ought to be "all utf-8" at least as far as web app encodings.
On Mon, Jul 17, 2017, 22:13 Kevin W. Wall [email protected] wrote:
Why not make a new class, say DefaultUTF8Encoder. If people want to use that one, then can just specify something like:
ESAPI.Encoder=org.owasp.esapi.reference.DefaultUTF8Encoder
in their ESAPI.properties file. That's more or less how it's designed to work. Also, if you change the CTOR(s) in DefaultEncoder from private to protected, you probably can subclass DefaultEncoder and inherit a lot of its desired behavior.
Make sense? If not, it's probably because I'm misunderstanding something.
Note that also if you think it makes more sense to make the UTF8 encoder the default one, we can change the ESAPI.properties files to reflect that as well.
-kevin
On Mon, Jul 17, 2017 at 1:04 PM, Matt Seil [email protected] wrote:
@kwwall https://github.com/kwwall -- I think to solve this problem, the only solution would be to create a fork of the PercentCodec that is specialized to UTF-8, and then have users specify the UTF-8 codec on demand.
The only problem is that UTF-8 would probably be the better one to use by default since it's a web standard.
Thoughts?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/ESAPI/esapi-java-legacy/issues/377#issuecomment-315814214 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AB3nmwlx3UxMGNNDWmdgU3V2RYdYAl3Zks5sO5JzgaJpZM4IjW-r
.
-- Blog: http://off-the-wall-security.blogspot.com/ | Twitter: @KevinWWall NSA: All your crypto bit are belong to us.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ESAPI/esapi-java-legacy/issues/377#issuecomment-315960047, or mute the thread https://github.com/notifications/unsubscribe-auth/AJEAQSAnvXkKdCaEUrCaTSl1JSID6PGgks5sPD8UgaJpZM4IjW-r .
Oh yeah, and a new encoder is exactly what I had in mind, with that kind of config.
If concerned about people complaining about the upgrade to the UTF-8 encoder, we could leave the default as is (DefaultEncoder) and just mention the alternative (DefaultUTF8Encoder, or whatever you want to name it)in a comment in ESAPI.properties just before the ESAPI.Encoder property.
@kwwall turns out this is non-trivial as well. @jeremiahjstacey I'm having a mental block right now. I'm not sure how to properly differentiate UTF-8 encoded strings... like %E2
is legal in the ASCII range, but how to tell the difference between %E2
and %E2%84%A2
?
Isn't the usual approach to go with the longest match?
-kevin
Blog: http://off-the-wall-security.blogspot.com/ | Twitter: @KevinWWall NSA: All your crypto bit are belong to us.
On Sun, Sep 1, 2019, 12:49 Matt Seil [email protected] wrote:
@kwwall https://github.com/kwwall turns out this is non-trivial as well. @jeremiahjstacey https://github.com/jeremiahjstacey I'm having a mental block right now. I'm not sure how to properly differentiate UTF-8 encoded strings... like %E2 is legal in the ASCII range, but how to tell the difference between %E2 and %E2%84%A2?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ESAPI/esapi-java-legacy/issues/377?email_source=notifications&email_token=AAO6PG3XYFGREY4UVJF4D33QHPXAHA5CNFSM4CENN6V2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5UGMCY#issuecomment-526935563, or mute the thread https://github.com/notifications/unsubscribe-auth/AAO6PGZPJAPT2G74GRS3AGTQHPXAHANCNFSM4CENN6VQ .
@xeno6696 without digging into it too much I'd suggest checking what the PushbackString would give you in a test case. I have it in my head that it defaults to the shortest group, but it's been a while since I've looked at it.
@jeremiahjstacey is correct, the PBS defaults to the shortest match.
Works for me. I think I was thinking of regex matching.
-kevin
Blog: http://off-the-wall-security.blogspot.com/ | Twitter: @KevinWWall NSA: All your crypto bit are belong to us.
On Sun, Sep 1, 2019, 14:30 Matt Seil [email protected] wrote:
@jeremiahjstacey https://github.com/jeremiahjstacey is correct, the PBS defaults to the shortest match.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ESAPI/esapi-java-legacy/issues/377?email_source=notifications&email_token=AAO6PGYJG3NG6XJ55CWNGY3QHQC3JA5CNFSM4CENN6V2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5UICIQ#issuecomment-526942498, or mute the thread https://github.com/notifications/unsubscribe-auth/AAO6PG443ADEUOMSEIQCMGDQHQC3JANCNFSM4CENN6VQ .
I could have sworn I saw a post in here asking about the percent codec...
@xeno6696 - You did. I still have it in my inbox. It was from Alessandro Giannone and said "Any chance the percent encoding is working with UTF-8?"
Perhaps he deleted it when he added comment to (closed) issue #303?? See https://github.com/ESAPI/esapi-java-legacy/issues/303#issuecomment-708700394 for details. Or perhaps he wasn't using a patched version of ESAPI to test it???
@agiannone -- Is there a reason you deleted your comment from this issue (Issue #377)?
@xeno6696 @kwwall - Thanks for the super prompt response!
After reading through the issues list and doing a bit of studying on the subject, I actually ended up moving my comment to #303 referencing #300 both of which seemed more applicable.
Effectively non-BMP characters use 3 pairs of hex digits to store the character. The PercentCodec evaluates them 1 pair at a time.
Since this is still open, I wanted to toss out what seems like a useful resource, at least to me: