twisted Incorrect (too long) header from server causes Twisted Agent to fail

@pawelmhm reported

Trac ID trac#8570

Type enhancement

Created 2016-07-07 12:58:22Z

	@pawelmhm reported
Trac ID	trac#8570
Type	enhancement
Created	2016-07-07 12:58:22Z

If Twisted Agent receives HTTP header with value that exceeds size limits defined in protocol it will fail and crash.

Expected behavior: if server sends incorrect header client should not fail but simply ignore this header.

To reproduce, create following Twisted server sending header with 100 000 characters in value:

#
import sys
from twisted.python import log
from twisted.web import server, resource
from twisted.internet import reactor

class Simple(resource.Resource):
    isLeaf = True

    def render_GET(self, request):
        request.setHeader("Set-Cookie", "a" * 100000)
        return "<html>Hello, world!</html>"

site = server.Site(Simple())
reactor.listenTCP(8080, site)
log.startLogging(sys.stdout)
reactor.run()

Twisted client trying to reach above server:

from twisted.internet import reactor
from twisted.web.client import Agent
from twisted.web.http_headers import Headers

agent = Agent(reactor)

d = agent.request(
    'GET',
    'http://localhost:8080',
    Headers({'User-Agent': ['Twisted Web Client Example']}),
    None)


def cbResponse(failure):
    print(failure)


d.addBoth(cbResponse)


def cbShutdown(ignored):
    reactor.stop()

d.addBoth(cbShutdown)

reactor.run()

Output is:

python client.py 
[Failure instance: Traceback (failure with no frames): <class 'twisted.web._newclient.ResponseFailed'>: [<twisted.python.failure.Failure exceptions.AttributeError: 'TransportProxyProducer' object has no attribute 'loseConnection'>]

Searchable metadata

trac-id__8570 8570
type__enhancement enhancement
reporter__pawelmhm pawelmhm
priority__normal normal
milestone__None None
branch__ 
branch_author__ 
status__new new
resolution__None None
component__core core
keywords__http__Agent__webclient http, Agent, webclient
time__1467896302835926 1467896302835926
changetime__1511740897981113 1511740897981113
version__None None
owner__None None

Jul 07 '16 12:07 twisted-trac

@glyph commented

The problem with accepting arbitrarily long headers is that you can still DoS a client by having it continue to read from a socket forever, never processing the response, never timing out.

This makes me inclined to "wontfix" this issue, but... I'm not entirely sure how other HTTP clients handle this. What do browsers do? cURL? wget? If other clients typically ignore headers over their max length like this then let's make the change.

Jul 08 '16 04:07 twisted-trac

@pawelmhm commented

What do browsers do? cURL? wget? If other clients typically ignore headers over their max length like this then let's make the change.

curl handles this without trimming headers (but my fish shell crashes immediately after request so I'm not entirely sure curl is not leaking anything, maybe it causes some segfaults).

Google-chrome reads full headers no problem.

wget refuses to read headers and retries printing following error:

--2016-08-30 08:44:23--  (try: 3)  http://localhost:8080/
Connecting to localhost (localhost)|127.0.0.1|:8080... connected.
HTTP request sent, awaiting response... Read error (Cannot allocate memory) in headers.
Retrying.

I think wget approach is best here in that it prints instructive error about impossibility of handling too long headers. I'm not entirely sure it makes sense to retry though. wget probably assumes that those headers are a result of misconfigured server, this assumption might not be true.

Aug 30 '16 06:08 twisted-trac

@jlitzingerdev commented

I made some comments regarding this on a review for #9295, Firefox and Safari on iOS also handle the headers just fine.

While I agree that allowing arbitrarily long headers is a dangerous, and am not advocating for that by default, it does seem like better communication is preferable to a won't fix.

Rationale:

The decision to disallow headers after a certain length is a policy. Whether this policy is good or bad (fwiw I'd say it is good as it defaults to safe) is less important than the fact that it is a Twisted specific opinion. With any policy decision there are at least two options:

Default to sane values and expose configuration to advanced users.
Clearly communicate that the error is due to a policy decision.

To that end, I'd propose one or both of two paths:

Ensure that the failure communicates that the header length was the problem, likely by handling lineLengthExceeded in one of the HTTP parsing classes.
Allow users to configure an IAgent through a new interface method/init argument. I suggest the former as it acts as a better contract for IAgent implementations, especially if the configuration object has its own interface.

That said, #2 has implications outside the scope of this ticket and deserves its own ticket/design discussion, so I'd vote for option 1 to address this ticket. Behavior doesn't change, but at least a user knows why.

Thoughts?

-- Note that I'm willing to put the work in for option 1, I'm not just throwing it out there, but I'd prefer to know there was some agreement with the work.

Nov 12 '17 21:11 twisted-trac

@adiroiban commented

I am happy to review option #11338. For my part, a simple log before closing the connection should be enough. This will help discover the cause of the error and then you can dig for your own fix :)

I remember that I look at passing a message via the exception handling, and it was not straightforward with the current API.

LineReceiver will just close the connection without any message and the loseConnection does not accept an argument

https://github.com/twisted/twisted/blob/twisted-17.9.0/src/twisted/protocols/basic.py#L638

One way in which I was trying to fix it is to pass the reference to protocol to the HTTPClientParse so that the parser will set a whyConnectionWasLost attribute to the HTTP11ClientProtocol and then the protocol could show a reason when connection was closed.

I saw this pattern in some other places in Twisted, but I would have preferred to be able to transmit the reason just via an exception

For option #2.

If you want the IAgent way, I guess that for headers lenght you will have to go to Agent -> HTTPConnectonPool -> HTTPClientFactory -> HTTP11ClientProtocol -> HTTPClientParser

But just to get headers length, maybe is easier to pass the options via the request object https://github.com/twisted/twisted/blob/twisted-17.9.0/src/twisted/web/_newclient.py#L1430

If you have a real need for any of the above options, I am happy to review the work.

Otherwise I also think that this should be closed as won't fix.

If in the future someone while have a real need for this, a new ticket and PR can be created and we can review that work.

Nov 27 '17 00:11 twisted-trac

Just an update to this one.

With the recent Twisted version, the error is now:

[Failure instance: Traceback (failure with no frames): <class 'twisted.web._newclient.ResponseFailed'>: [<twisted.python.failure.Failure twisted.internet.error.ConnectionDone: Connection was closed cleanly.>]
]

Which is expected... when the line is too long, for now Twisted just silently closed the connection.

I think is best to keep this open and close it once we can have something like

twisted.python.failure.Failure twisted.internet.error.ConnectionDone: Server returned a long header.

Jul 20 '23 16:07 adiroiban

@adiroiban thanks for tracking this

Jul 20 '23 16:07 glyph

In Scrapy we have an old issue about this which comes back now and again as people are affected while crawling specific websites.

In https://github.com/scrapy/scrapy/pull/5911 I tried to address this with the public API, but things got too messy, and eventually went with monkey-patching.

Before we merge that change, we would like to know if you are open to either increasing the default maximum length (e.g from 2**14 to 2**16), or option 2 as discussed here, i.e. provide a public API to change this value.

Dec 20 '23 12:12 Gallaecio

I don't know what to say about 2**16. Maybe 2**15 is a better default value.

At http://browsercookielimits.iain.guru/ I see the limit is 4KB per header

and the RFC https://datatracker.ietf.org/doc/html/rfc6265#section-6.1 I only see minimum limits.

Dec 20 '23 12:12 adiroiban

Those limits are specifically about the Cookie header, not headers as a whole. It seems the Referer header also gets a limit. But you can load https://www.vapestore.co.uk/ fine on a web browser even though it has a header with 28k+ chars.

2**15 would be OK by me, it certainly would be enough to address the only currently known website affected (https://www.vapestore.co.uk/). I went with 2**16 because https://www.vapestore.co.uk/ was rather close to 2**15, but in the end it is an arbitrary decision, it could be that browsers do not limit headers (beside special ones).

Dec 20 '23 13:12 Gallaecio

I expect that browser have some limits... they just might be very big ... otherwise you can end up with out of memory errors.

Here are the current limits

500 total header count
16384 bit for all headers - this can be increased

https://github.com/twisted/twisted/blob/c0c6440827c47797415b6ddf5bbf0b60a203da02/src/twisted/web/http.py#L2216-L2217

we also have maximum line length , that can be increased for HTTP.

https://github.com/twisted/twisted/blob/c0c6440827c47797415b6ddf5bbf0b60a203da02/src/twisted/protocols/basic.py#L509

Right now, both server and client HTTP part share the same HTTP headers parsing code.

I think that we can increase the limits.

I would go with a conservative value of

2**15 for a single line/header
2**16 for total headers

What do you think ?

Dec 20 '23 14:12 adiroiban

Sounds good to me.

Dec 20 '23 15:12 Gallaecio

Adrian, if you have the time for a PR, I am happy to review it. I don't think I will have much time in the near future to work on this. Regards

Dec 29 '23 12:12 adiroiban

twisted twisted copied to clipboard

Incorrect (too long) header from server causes Twisted Agent to fail

twisted
twisted copied to clipboard