ring icon indicating copy to clipboard operation
ring copied to clipboard

Params middleware does not decode some strings correctly

Open iku000888 opened this issue 8 years ago • 5 comments
trafficstars

Hi James, thanks for the continued work on maintaining ring.

Just want to throw this issue out there in case anyone else stumbles upon the same issue, plus gathering some feedback.

In our project, we were dealing with non utf-8 encoded forms (Shift_JIS to be specific), and found that the params middleware is garbling the incoming string, even though we were using the wrap-params middleware with the Shift_JIS option e.g. (param/wrap-params {:encoding "Shift_JIS"})

Looking at the source code deeper inside, I see that java.net.URLDecoder is being used for parsing, and that java.net.URLDecoder does not decode some non utf-8 strings correctly, while org.apache.commons.codec.net.URLCodec can, illustrated in below snippet.

  ;; "モジバケコワイ", URL encoded with Shif_JIS by the browser
  "%83%82%83W%83o%83P%83R%83%8F%83C"

 (import [java.net URLDecoder])
 (URLDecoder/decode "%83%82%83W%83o%83P%83R%83%8F%83C" "Shift-JIS")
 ;; => "モ�W�o�P�Rワ�C" 

 (import [org.apache.commons.codec.net URLCodec])
 (let [codec (URLCodec. "Shift-JIS")]
   (.decode codec "%83%82%83W%83o%83P%83R%83%8F%83C" "Shift-JIS"))
 ;; => "モジバケコワイ"

We came up with 2 work arounds for this issue:

  1. Use org.apache.commons.codec.net.URLCodec instead of java.net.URLDecoder for decoding URL encoded parameteres
  2. Use a form with an enctype of multipart/form-data so that nothing gets encoded and thus avoid the problem entirely

Would appreciate the if I can get feed back on:

  1. Which of the above workaround is preferable?
  2. Would you be interested in a PR that replaces the decoder used in the params middleware with org.apache.commons.codec.net URLCodec?

Thanks!

iku000888 avatar Nov 29 '16 09:11 iku000888

Thanks for the feedback! It's odd that the Java URLDecoder doesn't work for the Shift-JIS charset.

If the Commons URLCodec works better, and there's no other side effects, then I think we should swap to that in the Ring-Codec library. It already depends on the commons-codec library, so I don't think there should be any disadvantage to it. I've done some quick benchmarks, and commons-codec seems about twice as fast decoding anyway.

weavejester avatar Dec 01 '16 11:12 weavejester

Looks like it'll take a bit of work to update the Ring-Codec library. As well as the URLDecoder, Ring-Codec has its own custom implementation for percent encoding that suffers from the same issue.

weavejester avatar Dec 01 '16 14:12 weavejester

Thanks for taking a look at this! Let me know if there is anything I can help with!

iku000888 avatar Dec 01 '16 23:12 iku000888

I took a while that the scope of the issue is in ring-codec. I have created https://github.com/ring-clojure/ring-codec/pull/15 which just replaces the decoder. Maybe it is better to move the discussion over there?

iku000888 avatar Dec 07 '16 05:12 iku000888

I have faced a similar issue where wrap-params causes the handler to sometimes throw java.io.UnsupportedEncodingException due to bad encoding. It would be great to return HTTP 400 as a response to such requests - I am currently using a workaround on top of wrap-params by applying a middleware to intercept the exception and throw HTTP 400.

Calling (.getMessage ^UnsupportedEncodingException ex) gives the encoding. HTTP 415 may be an alternative status code for this use case.

kumarshantanu avatar Jan 01 '18 18:01 kumarshantanu