snap-extras
snap-extras copied to clipboard
writeJSON is not UTF-8 compliant
If we write an instance of a ToJSON
data type using writeJSON
, it doesn't handle correctly utf-8 text with contains accented letters. This is an excerpt of an italian text, with the current function:
di essere il più precisi possibile nell'inserimento
The problem is twofold:
a) We need to encode using not the standard encode
function, but the one inside Data.Aeson.Encode
b) We need to set the charset=utf-8
encoding inside the HTTP header.
This is the proposed patch:
-------------------------------------------------------------------------------
-- | Set MIME to 'application/json' and write given object into
-- 'Response' body. Exactly as Snap.Extras' @writeJSON@, but handles correctly
-- UTF-8 text.
writeEncodedJSON :: (MonadSnap m, ToJSON a) => a -> m ()
writeEncodedJSON a = do
modifyResponse $ setHeader "Content-Type" "application/json; charset=utf-8"
writeLBS . AE.encode $ a
where AE.encode
is a qualified import of Data.Aeson.Encode
.
With the proposed patch, everything works as expected:
di essere il più precisi possibile nell'inserimento
I also suggest we refactor out the modifyResponse
, maybe creating a combinator which adds the charset utf8 ad the content-type, so that we can reuse what we already have : jsResponse
, jsonResponse
etc.
A.
Ah, weird. Couple of questions:
- Aren't we already using the encode from Data.Aeson? A look at http://hackage.haskell.org/packages/archive/aeson/0.6.2.0/doc/html/src/Data-Aeson-Generic.html#encode shows that we are using Data.Aeson.Encode.encode. Am I missing something here?
- As explained here (http://stackoverflow.com/questions/9254891/what-does-content-type-application-json-charset-utf-8-really-mean), I thought all JSON is automatically interpreted as UTF8 and therefore the additional denotation is unnecessary?
- What front-end/client/browser are you using to interpret the results? It almost sounds like you're using an invalid parser that is NOT assuming any JSON is utf8 but instead assuming it is latin-1 or ascii or something. As far as I know, that is invalid behavior. For example, try passing a non-utf8 valid string to aeson for parsing and it will crap out with an error. It forces you to ensure your input is utf8 encoded.
Hi Oz, again let me elaborate on this and I will get back to you. I can reply to 3) straight away:
- I'm using Google Chrome, so I don't think I'm in any way doing something an end user wouldn't do.
I'll get back later to you with points 1 and 2.
@adinapoli any update on this?