bottle icon indicating copy to clipboard operation
bottle copied to clipboard

the autojson can't correctly handle chinese

Open xm0625 opened this issue 9 years ago • 14 comments

whatever the response.charset or the charset of response.content_type I set, the chinese characters are always converted to unicode characters. so I can only turn of the autojson, and output the json string by myself. json.dumps(<dict_to_convert>, ensure_ascii=True) performs like the autojson(unicode) json.dumps(<dict_to_convert>, ensure_ascii=False) is what i want.(bytestring) maybe bottle needs to provide a choose.

xm0625 avatar Sep 01 '16 03:09 xm0625

response.content_type = 'application/json; charset=' + response.charset result_data = json.dumps({"code": "1", "message": "ok", "data": response_data},encoding=response.charset,ensure_ascii=False) return result_data

xm0625 avatar Sep 01 '16 04:09 xm0625

Can you give some example code that shows what the problem is? What do you mean "converted to unicode characters"? Chinese characters only exist as unicode characters.

eric-wieser avatar Sep 01 '16 06:09 eric-wieser

@app.route('/api/hello', method='GET')
def hello():
    return {"hello": "世界"}

the output is

{"hello": "\u4e16\u754c"}

the Content-Type in response header:

Content-Type:application/json

xm0625 avatar Sep 01 '16 08:09 xm0625

@app.route('/api/hello', method='GET')
def hello():
    response.content_type = 'application/json; charset=UTF-8'
    return {"hello": "世界"}

the output is

{"hello": "\u4e16\u754c"}

the Content-Type in response header:

Content-Type:application/json

xm0625 avatar Sep 01 '16 08:09 xm0625

Python 2 or 3?

You realize that that output is perfectly acceptable behavior, right? Any reasonable json parser will parse {"hello": "世界"} and {"hello": "\u4e16\u754c"} to the same thing.

I think that defaulting to ensure_ascii=True is sane behavior, since it's more likely that tools choke on unicode input than it is that their json parser is non-compliant. Having said that, it's obviously awkward for you not to be able to read the chinese characters from a raw request.

What do you want the api to look like to tell bottle your choice?

eric-wieser avatar Sep 01 '16 09:09 eric-wieser

As a hack, you can currently do:

app = bottle.default_app()  # or your app object
app.plugins[0].json_dumps = lambda: *args, **kwargs: \
    json.dump(*args, ensure_ascii=False, **kwargs).encode('utf8')

eric-wieser avatar Sep 01 '16 09:09 eric-wieser

python2. I know the two things is the same thing.I prefer to output the content in bytestring('\xe4\xb8\x96\xe7\x95\x8c')("世界"),the unicode('\u4e16\u754c') in output of API is not readable, and this makes me feel not very good. I use a decorator with json.dumps(xxx, ensure_ascii=True) temporarily sovled this. your solution is also very good. tks. ^.^

xm0625 avatar Sep 01 '16 09:09 xm0625

By that you presumably meant to say "I'd prefer to output UTF-8 encoded characters than \u-escaped characters. In both cases the output is a bytestring

eric-wieser avatar Sep 01 '16 09:09 eric-wieser

JSON is defined as UTF-8 encoded text, so it should be fine to include non-ascii characters in JSON strings and there is no need for these unicode escape sequences. It would certainly save some bytes and be easier to read for humans, and human readable API responses are a good thing in my opinion.

If both representations are equal and the human readable one actually saves some bytes and involves no additional overhead, I'd say we should switch the default to the human readable representation.

defnull avatar Sep 01 '16 16:09 defnull

I'd like to introduce two new config parameters:

  • autojson.ascii (default: false). If true, convert all non-ascii characters to escape sequences.
  • autojson.compact (default: false) If true, return compact (but less readable) JSON with less whitespace.

And a new default behavior: The autojson plugin should return human readable and developer friendly JSON by default (indented, with unicode characters in strings).

This might break (broken) tests that test against hard-coded JSON strings, but should not break any real world applications or integrations. We can still discuss if the default values for the configuration should reflect the current behavior, and warn about the changing defaults with depr(0,13) so we can change it in 0.14.

defnull avatar Sep 01 '16 17:09 defnull

http://i.imgur.com/0PVtmcG.gif

defnull avatar Sep 01 '16 21:09 defnull

tks for reply. @defnull I'm glad my idea can be adopt.Bottle is quite good to use.

xm0625 avatar Sep 02 '16 02:09 xm0625

I just committed a long overdue ConfigDict patch and a more sophisticated autojson config to master. This feature request should now be quite straight forward to implement. See Bottle.init() and JsonPlugin.

defnull avatar Sep 25 '16 18:09 defnull