snap-extras icon indicating copy to clipboard operation
snap-extras copied to clipboard

writeJSON is not UTF-8 compliant

Open adinapoli opened this issue 11 years ago • 3 comments

If we write an instance of a ToJSON data type using writeJSON, it doesn't handle correctly utf-8 text with contains accented letters. This is an excerpt of an italian text, with the current function:

di essere il più precisi possibile nell'inserimento

The problem is twofold:

a) We need to encode using not the standard encode function, but the one inside Data.Aeson.Encode b) We need to set the charset=utf-8 encoding inside the HTTP header.

This is the proposed patch:

-------------------------------------------------------------------------------
-- | Set MIME to 'application/json' and write given object into
-- 'Response' body. Exactly as Snap.Extras' @writeJSON@, but handles correctly
-- UTF-8 text.
writeEncodedJSON :: (MonadSnap m, ToJSON a) => a -> m ()
writeEncodedJSON a = do
  modifyResponse $ setHeader "Content-Type" "application/json; charset=utf-8"
  writeLBS . AE.encode $ a

where AE.encode is a qualified import of Data.Aeson.Encode.

With the proposed patch, everything works as expected:

di essere il più precisi possibile nell'inserimento

I also suggest we refactor out the modifyResponse, maybe creating a combinator which adds the charset utf8 ad the content-type, so that we can reuse what we already have : jsResponse, jsonResponse etc.

A.

adinapoli avatar Aug 18 '13 13:08 adinapoli

Ah, weird. Couple of questions:

  1. Aren't we already using the encode from Data.Aeson? A look at http://hackage.haskell.org/packages/archive/aeson/0.6.2.0/doc/html/src/Data-Aeson-Generic.html#encode shows that we are using Data.Aeson.Encode.encode. Am I missing something here?
  2. As explained here (http://stackoverflow.com/questions/9254891/what-does-content-type-application-json-charset-utf-8-really-mean), I thought all JSON is automatically interpreted as UTF8 and therefore the additional denotation is unnecessary?
  3. What front-end/client/browser are you using to interpret the results? It almost sounds like you're using an invalid parser that is NOT assuming any JSON is utf8 but instead assuming it is latin-1 or ascii or something. As far as I know, that is invalid behavior. For example, try passing a non-utf8 valid string to aeson for parsing and it will crap out with an error. It forces you to ensure your input is utf8 encoded.

ozataman avatar Sep 16 '13 15:09 ozataman

Hi Oz, again let me elaborate on this and I will get back to you. I can reply to 3) straight away:

  1. I'm using Google Chrome, so I don't think I'm in any way doing something an end user wouldn't do.

I'll get back later to you with points 1 and 2.

adinapoli avatar Sep 16 '13 15:09 adinapoli

@adinapoli any update on this?

tom-bop avatar Aug 27 '18 15:08 tom-bop