twisted
twisted copied to clipboard
Incorrect (too long) header from server causes Twisted Agent to fail
@pawelmhm reported | |
---|---|
Trac ID | trac#8570 |
Type | enhancement |
Created | 2016-07-07 12:58:22Z |
If Twisted Agent receives HTTP header with value that exceeds size limits defined in protocol it will fail and crash.
Expected behavior: if server sends incorrect header client should not fail but simply ignore this header.
To reproduce, create following Twisted server sending header with 100 000 characters in value:
#
import sys
from twisted.python import log
from twisted.web import server, resource
from twisted.internet import reactor
class Simple(resource.Resource):
isLeaf = True
def render_GET(self, request):
request.setHeader("Set-Cookie", "a" * 100000)
return "<html>Hello, world!</html>"
site = server.Site(Simple())
reactor.listenTCP(8080, site)
log.startLogging(sys.stdout)
reactor.run()
Twisted client trying to reach above server:
from twisted.internet import reactor
from twisted.web.client import Agent
from twisted.web.http_headers import Headers
agent = Agent(reactor)
d = agent.request(
'GET',
'http://localhost:8080',
Headers({'User-Agent': ['Twisted Web Client Example']}),
None)
def cbResponse(failure):
print(failure)
d.addBoth(cbResponse)
def cbShutdown(ignored):
reactor.stop()
d.addBoth(cbShutdown)
reactor.run()
Output is:
python client.py
[Failure instance: Traceback (failure with no frames): <class 'twisted.web._newclient.ResponseFailed'>: [<twisted.python.failure.Failure exceptions.AttributeError: 'TransportProxyProducer' object has no attribute 'loseConnection'>]
Searchable metadata
trac-id__8570 8570
type__enhancement enhancement
reporter__pawelmhm pawelmhm
priority__normal normal
milestone__None None
branch__
branch_author__
status__new new
resolution__None None
component__core core
keywords__http__Agent__webclient http, Agent, webclient
time__1467896302835926 1467896302835926
changetime__1511740897981113 1511740897981113
version__None None
owner__None None
@glyph commented |
---|
The problem with accepting arbitrarily long headers is that you can still DoS a client by having it continue to read from a socket forever, never processing the response, never timing out.
This makes me inclined to "wontfix" this issue, but... I'm not entirely sure how other HTTP clients handle this. What do browsers do? cURL? wget? If other clients typically ignore headers over their max length like this then let's make the change.
@pawelmhm commented |
---|
What do browsers do? cURL? wget? If other clients typically ignore headers over their max length like this then let's make the change.
curl handles this without trimming headers (but my fish shell crashes immediately after request so I'm not entirely sure curl is not leaking anything, maybe it causes some segfaults).
Google-chrome reads full headers no problem.
wget refuses to read headers and retries printing following error:
--2016-08-30 08:44:23-- (try: 3) http://localhost:8080/
Connecting to localhost (localhost)|127.0.0.1|:8080... connected.
HTTP request sent, awaiting response... Read error (Cannot allocate memory) in headers.
Retrying.
I think wget approach is best here in that it prints instructive error about impossibility of handling too long headers. I'm not entirely sure it makes sense to retry though. wget probably assumes that those headers are a result of misconfigured server, this assumption might not be true.
@jlitzingerdev commented |
---|
I made some comments regarding this on a review for #9295, Firefox and Safari on iOS also handle the headers just fine.
While I agree that allowing arbitrarily long headers is a dangerous, and am not advocating for that by default, it does seem like better communication is preferable to a won't fix.
Rationale:
The decision to disallow headers after a certain length is a policy. Whether this policy is good or bad (fwiw I'd say it is good as it defaults to safe) is less important than the fact that it is a Twisted specific opinion. With any policy decision there are at least two options:
- Default to sane values and expose configuration to advanced users.
- Clearly communicate that the error is due to a policy decision.
To that end, I'd propose one or both of two paths:
-
Ensure that the failure communicates that the header length was the problem, likely by handling lineLengthExceeded in one of the HTTP parsing classes.
-
Allow users to configure an IAgent through a new interface method/init argument. I suggest the former as it acts as a better contract for IAgent implementations, especially if the configuration object has its own interface.
That said, #2 has implications outside the scope of this ticket and deserves its own ticket/design discussion, so I'd vote for option 1 to address this ticket. Behavior doesn't change, but at least a user knows why.
Thoughts?
-- Note that I'm willing to put the work in for option 1, I'm not just throwing it out there, but I'd prefer to know there was some agreement with the work.
@adiroiban commented |
---|
I am happy to review option #11338. For my part, a simple log before closing the connection should be enough. This will help discover the cause of the error and then you can dig for your own fix :)
I remember that I look at passing a message via the exception handling, and it was not straightforward with the current API.
LineReceiver will just close the connection without any message and the loseConnection does not accept an argument
https://github.com/twisted/twisted/blob/twisted-17.9.0/src/twisted/protocols/basic.py#L638
One way in which I was trying to fix it is to pass the reference to protocol to the HTTPClientParse so that the parser will set a whyConnectionWasLost
attribute to the HTTP11ClientProtocol and then the protocol could show a reason when connection was closed.
I saw this pattern in some other places in Twisted, but I would have preferred to be able to transmit the reason just via an exception
For option #2.
If you want the IAgent way, I guess that for headers lenght you will have to go to Agent -> HTTPConnectonPool -> HTTPClientFactory -> HTTP11ClientProtocol -> HTTPClientParser
But just to get headers length, maybe is easier to pass the options via the request object https://github.com/twisted/twisted/blob/twisted-17.9.0/src/twisted/web/_newclient.py#L1430
If you have a real need for any of the above options, I am happy to review the work.
Otherwise I also think that this should be closed as won't fix.
If in the future someone while have a real need for this, a new ticket and PR can be created and we can review that work.
Just an update to this one.
With the recent Twisted version, the error is now:
[Failure instance: Traceback (failure with no frames): <class 'twisted.web._newclient.ResponseFailed'>: [<twisted.python.failure.Failure twisted.internet.error.ConnectionDone: Connection was closed cleanly.>]
]
Which is expected... when the line is too long, for now Twisted just silently closed the connection.
I think is best to keep this open and close it once we can have something like
twisted.python.failure.Failure twisted.internet.error.ConnectionDone: Server returned a long header.
@adiroiban thanks for tracking this
In Scrapy we have an old issue about this which comes back now and again as people are affected while crawling specific websites.
In https://github.com/scrapy/scrapy/pull/5911 I tried to address this with the public API, but things got too messy, and eventually went with monkey-patching.
Before we merge that change, we would like to know if you are open to either increasing the default maximum length (e.g from 2**14 to 2**16), or option 2 as discussed here, i.e. provide a public API to change this value.
I don't know what to say about 2**16
. Maybe 2**15
is a better default value.
At http://browsercookielimits.iain.guru/ I see the limit is 4KB per header
and the RFC https://datatracker.ietf.org/doc/html/rfc6265#section-6.1 I only see minimum limits.
Those limits are specifically about the Cookie
header, not headers as a whole. It seems the Referer
header also gets a limit. But you can load https://www.vapestore.co.uk/ fine on a web browser even though it has a header with 28k+ chars.
2**15
would be OK by me, it certainly would be enough to address the only currently known website affected (https://www.vapestore.co.uk/). I went with 2**16
because https://www.vapestore.co.uk/ was rather close to 2**15
, but in the end it is an arbitrary decision, it could be that browsers do not limit headers (beside special ones).
I expect that browser have some limits... they just might be very big ... otherwise you can end up with out of memory errors.
Here are the current limits
- 500 total header count
- 16384 bit for all headers - this can be increased
https://github.com/twisted/twisted/blob/c0c6440827c47797415b6ddf5bbf0b60a203da02/src/twisted/web/http.py#L2216-L2217
we also have maximum line length , that can be increased for HTTP.
https://github.com/twisted/twisted/blob/c0c6440827c47797415b6ddf5bbf0b60a203da02/src/twisted/protocols/basic.py#L509
Right now, both server and client HTTP part share the same HTTP headers parsing code.
I think that we can increase the limits.
I would go with a conservative value of
- 2**15 for a single line/header
- 2**16 for total headers
What do you think ?
Sounds good to me.
Adrian, if you have the time for a PR, I am happy to review it. I don't think I will have much time in the near future to work on this. Regards