eventlet icon indicating copy to clipboard operation
eventlet copied to clipboard

UTF-8 in headers is not handled correctly

Open Avamander opened this issue 5 months ago • 6 comments

It seems like UTF-8 content in response headers causes an error due to the following code:

for header in response_headers:
    towrite.append(('%s: %s\r\n' % header).encode('latin-1'))

I can't find a spec saying that response header values have to be Latin-1, I'm not the only one for whom it does not seem to be specified. In the case of H2 the RFC says: "Just as in HTTP/1.x, header field names are strings of ASCII characters".

This leads me to believe that the values should not be forcefully encoded as Latin-1, valid UTF-8 should pass through unmodified. Alternatively, for ease-of-use, eventlet should %-encode any UTF-8 values instead of erroring out like this.

Avamander avatar Jul 22 '25 22:07 Avamander

Hello,

Thanks for reporting this issue. Apparently there is many place in eventlet where Latin-1 is used in encode or decode, unfortunately I don't know if there is a reason behind that...

Will have to dive deep a bit before giving you an answer.

4383 avatar Jul 23 '25 20:07 4383

So, I made some researches on my side, and I agree with you. Valid UTF8 should pass, and header field names should be strings of ASCII characters.

My main concern is that this kind of change could heavily impact existing implementations using Eventlet. We are close from retiring Eventlet, so I wonder if we really want to implement such changes in the middle of the migration away from Eventlet. That's a potential source of brokenness.

Do you want to propose a patch?

4383 avatar Jul 24 '25 14:07 4383

I doubt it can cause breakage because it would practically just be less strict/fragile after any changes. Anything non-ASCII is already not passing through, anything ASCII would remain the same.

The easiest would probably be something like this:

for header in response_headers:
    name, value = header
    name, value = name.encode('latin-1'), value.encode('utf-8')
    towrite.append((b'%s: %s\r\n' % (name, value)))

Unsure about intended Python version support, it might be possible to write this as a nicer f-string or even refactor it out into a wrapper (which would also make it possible for some to monkeypatch things like %-encoding in). But the core concept would remain the same.

Avamander avatar Jul 24 '25 15:07 Avamander

Ok, thanks for having sharing your thoughts. Would you like to propose a patch?

4383 avatar Jul 24 '25 15:07 4383

@4383 Something like as above or should I refactor it out into a function?

Avamander avatar Jul 25 '25 14:07 Avamander

Something like the above should be enough. Thanks

4383 avatar Jul 25 '25 17:07 4383