UTF-8 in headers is not handled correctly
It seems like UTF-8 content in response headers causes an error due to the following code:
for header in response_headers:
towrite.append(('%s: %s\r\n' % header).encode('latin-1'))
I can't find a spec saying that response header values have to be Latin-1, I'm not the only one for whom it does not seem to be specified. In the case of H2 the RFC says: "Just as in HTTP/1.x, header field names are strings of ASCII characters".
This leads me to believe that the values should not be forcefully encoded as Latin-1, valid UTF-8 should pass through unmodified. Alternatively, for ease-of-use, eventlet should %-encode any UTF-8 values instead of erroring out like this.
Hello,
Thanks for reporting this issue.
Apparently there is many place in eventlet where Latin-1 is used in encode or decode, unfortunately I don't know if there is a reason behind that...
Will have to dive deep a bit before giving you an answer.
So, I made some researches on my side, and I agree with you. Valid UTF8 should pass, and header field names should be strings of ASCII characters.
My main concern is that this kind of change could heavily impact existing implementations using Eventlet. We are close from retiring Eventlet, so I wonder if we really want to implement such changes in the middle of the migration away from Eventlet. That's a potential source of brokenness.
Do you want to propose a patch?
I doubt it can cause breakage because it would practically just be less strict/fragile after any changes. Anything non-ASCII is already not passing through, anything ASCII would remain the same.
The easiest would probably be something like this:
for header in response_headers:
name, value = header
name, value = name.encode('latin-1'), value.encode('utf-8')
towrite.append((b'%s: %s\r\n' % (name, value)))
Unsure about intended Python version support, it might be possible to write this as a nicer f-string or even refactor it out into a wrapper (which would also make it possible for some to monkeypatch things like %-encoding in). But the core concept would remain the same.
Ok, thanks for having sharing your thoughts. Would you like to propose a patch?
@4383 Something like as above or should I refactor it out into a function?
Something like the above should be enough. Thanks