bottle icon indicating copy to clipboard operation
bottle copied to clipboard

Charset can no longer be overridden

Open foxx opened this issue 7 years ago • 4 comments

Also a duplicate of #820.

The docs state the following;

Bottle uses the charset parameter of the Content-Type header to decide how to encode unicode strings. This header defaults to text/html; charset=UTF8 and can be changed using the Response.content_type attribute or by setting the Response.charset attribute directly. (The Response object is described in the section The Response Object.)

Doing so will result in the following;

(Pdb) response.charset = 'UTF-8'
*** AttributeError: can't set attribute

Source code shows the following;

    @property
    def charset(self, default='UTF-8'):
        """ Return the charset specified in the content-type header (default: utf8). """
        if 'charset=' in self.content_type:
            return self.content_type.split('charset=')[-1].split(';')[0].strip()
        return default

In order to override charset, we'd have to append the string to content_type, which is a bit hacky.

@defnull Should I send a patch to introduce a setter on this property to fix this functionality?

foxx avatar Mar 08 '18 12:03 foxx

I'm not sure if it ever worked as documented in the tutorial, and I tend to say that its better to fix the docs instead of the code. Here is why:

Adding a setter for BaseRequest.charset would require parsing the existing content-type header in a proper way (to ensure other header options are not lost, if any), add or replace a charset option, then generating a new header value. Or faking it with a regexp. A lot of complexity for something that is easily done outside of bottle in app code by just setting the full header value directly. Most of the time you'd set both at the same time and always to the same value anyway.

response.content_type = "text/plain; charset=not-utf8"

Also, what happens if I set the charset first, but the content type later? The current implementation would simply overwrite the content-type header and the charset would be lost, introducing an unintuitive coupling between these to. Hiding/abstracting away too much of the HTTP protocol from the developer has its drawbacks.

How about extending the BaseResponse.set_header() method to accept options? Then this would be a nice alternative:

response.set_header("Content-Type", "text/plain", charset="not-utf8")

This should properly quote the option-values of cause.

defnull avatar Mar 08 '18 13:03 defnull

I'm not sure if it ever worked as documented in the tutorial, and I tend to say that its better to fix the docs instead of the code. Here is why:

Yeah I looked back over code dating 2012 and couldn't see it ever working as documented.

How about extending the BaseResponse.set_header() method to accept options?

Agreed, this makes more sense. PR coming shortly.

PS) thank you for the quick reply @defnull !

foxx avatar Mar 08 '18 13:03 foxx

Great, looking forward to it :)

Parsing headers is already implemented (_parse_http_header(h)). You can get inspiration from there, or perhaps you find a ready to use solution in the stdlib that works with all supported python versions. Quoting in headers is a bit tricky to get right.

defnull avatar Mar 08 '18 13:03 defnull

Hey @defnull any chance of a review for PR #1049? Would be nice to get it in the next release cycle if possible!

foxx avatar Apr 15 '18 21:04 foxx