aiohttp icon indicating copy to clipboard operation
aiohttp copied to clipboard

Occasional errors `ClientPayloadError: Response payload is not completed` in client requests

Open dangusev opened this issue 5 years ago • 20 comments

Hi,

I'm noticing errors ClientPayloadError: Response payload is not completed with some urls. You can request the same url multiple times and sometimes it fails.

Might be related to https://github.com/aio-libs/aiohttp/issues/2954

Please see the traceback below:

In [20]: r = await s.get('http://rayih.com/podcast/0001.xml')

In [21]: await r.read()
---------------------------------------------------------------------------
ClientPayloadError                        Traceback (most recent call last)
<ipython-input-21-39c3a552a171> in async-def-wrapper()

***/lib/python3.7/site-packages/aiohttp/client_reqrep.py in read(self)
    971         if self._body is None:
    972             try:
--> 973                 self._body = await self.content.read()
    974                 for trace in self._traces:
    975                     await trace.send_response_chunk_received(self._body)

***/lib/python3.7/site-packages/aiohttp/streams.py in read(self, n)
    356             blocks = []
    357             while True:
--> 358                 block = await self.readany()
    359                 if not block:
    360                     break

***/lib/python3.7/site-packages/aiohttp/streams.py in readany(self)
    378         # without feeding any data
    379         while not self._buffer and not self._eof:
--> 380             await self._wait('readany')
    381
    382         return self._read_nowait(-1)

***/lib/python3.7/site-packages/aiohttp/streams.py in _wait(self, func_name)
    294             if self._timer:
    295                 with self._timer:
--> 296                     await waiter
    297             else:
    298                 await waiter

ClientPayloadError: Response payload is not completed

Steps to reproduce

Please run this script several times. In my case it fails 1 out of 10. curl 7.54.0 and Firefox 71.0b12 work fine and stable.

import aiohttp

async with aiohttp.ClientSession as session:
    async with session.get('http://rayih.com/podcast/0001.xml') as response:
        await response.read()

Your environment

client aiohttp 3.6.2, python 3.7.5, OS X 10.12.6 (Sierra)

dangusev avatar Nov 27 '19 22:11 dangusev

The issue still persists as of 3.8.1. It happens randomly when downloading large files. Is there possibly any workaround for this?

SKevo18 avatar Jul 12 '22 08:07 SKevo18

Any follow ups on this one?

joelcorporan avatar Jul 26 '22 22:07 joelcorporan

Also looking for a follow-up to this. Attempting to get large numbers of RSS feeds with asyncio results in this exception for me.

zadamg avatar Nov 01 '22 22:11 zadamg

This part of the client is pretty much unusable because of this bug... No luck with the socket keepalive hack either.

alexjc avatar Feb 15 '23 22:02 alexjc

Hey everyone, I just released v3.9.4rc0. It'll hopefully help us get more complete tracebacks and that in turn, will let us know the underlying causes for this exception being raised — there may be several distinct behaviors underneath and we need to discover what they are.

If you have ability to do so, please try out this pre-release in environments where you can reproduce your problems and come back with the logs with complete tracebacks. Let's figure this out together!

webknjaz avatar Feb 16 '24 14:02 webknjaz

@webknjaz - I installed rc0. For proper context, I'm using aiohttp indirectly through the third-party package aiosseclient (as aio-libs' own aiohttp-sse doesn't appear to have a Client), but given the traceback, I'm not sure that's super-relevant; SSE can have multiple payloads with just line endings, and I'm wondering if that's related here (and the linked issue from the original comment).

Please do let me know if there's anything I can do with rc0 to assist.

Traceback (most recent call last):
  File "[...]/aiohttp/client_proto.py", line 94, in connection_lost
    uncompleted = self._parser.feed_eof()
  File "aiohttp/_http_parser.pyx", line 507, in aiohttp._http_parser.HttpParser.feed_eof
aiohttp.http_exceptions.TransferEncodingError: 400, message:
  Not enough data for satisfy transfer length header.

The above exception was the direct cause of the following exception:

[snip asyncio setup]
  File "sse.py", line 10, in main
    async for event in aiosseclient('https://stream.wikimedia.org/v2/stream/page-undelete'):
  File "[...]/aiosseclient.py", line 128, in aiosseclient
    async for line in response.content:
  File "[...]/aiohttp/streams.py", line 50, in __anext__
    rv = await self.read_func()
  File "[...]/aiohttp/streams.py", line 317, in readline
    return await self.readuntil()
  File "[...]/aiohttp/streams.py", line 351, in readuntil
    await self._wait("readuntil")
  File "[...]/aiohttp/streams.py", line 312, in _wait
    await waiter
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed: <TransferEncodingError: 400, message='Not enough data for satisfy transfer length header.'>

https://github.com/ebraminio/aiosseclient/blob/main/aiosseclient.py#L128

Wikimedia is a popular platform to test SSE streams, you can use one that doesn't get a lot of traffic like https://stream.wikimedia.org/v2/stream/page-undelete to replicate; it'll fail in about 60s.

import asyncio
import aiohttp
from aiosseclient import aiosseclient  # pip install aiosseclient

async def main():
    async for event in aiosseclient('https://stream.wikimedia.org/v2/stream/recentchange'):
        print(event)

if __name__ == '__main__':
    asyncio.run(main())

DefiDebauchery avatar Feb 17 '24 04:02 DefiDebauchery

@DefiDebauchery it does look relevant at glance, thanks! I saw several issues with that client error exception but no context. And I'm pretty sure there are several different underlying causes. It's useful to be able to collect them and see if some aren't handled properly on the aiohttp side. aiohttp should only raise errors for misbehaving servers and other legit problems, obviously.

Replicating these problems in isolated environments would be even better as relying on third party services is not exactly easy to debug, especially with an added layer of TLS in traffic captures.

webknjaz avatar Feb 17 '24 23:02 webknjaz

Wikimedia is a popular platform to test SSE streams, you can use one that doesn't get a lot of traffic like https://stream.wikimedia.org/v2/stream/page-undelete to replicate; it'll fail in about 60s.

Ran it for 5 mins with no exceptions. You'll need to provide a better reproducer.

Dreamsorcerer avatar Feb 19 '24 19:02 Dreamsorcerer

Nevermind, the page-undelete fails, but the recentchanges in the code block works fine.

Dreamsorcerer avatar Feb 19 '24 19:02 Dreamsorcerer

I don't think the error is wrong, the connection is getting dropped after 60 seconds of inactivity. It seems like it's an abrupt disconnect, so the content of the message is not completed.

Dreamsorcerer avatar Feb 19 '24 19:02 Dreamsorcerer

So, that SSE one looks to me like you either need to handle the error and retry, or there is something with the aiosseclient that should be doing the retry (or having some kind of ping to keep the connection open maybe).

The original reproducer at the top doesn't result in any exceptions when I test now (and the issue is from over 4 years ago).

So, I'd suggest we close this one as I don't think there's anything wrong here.

Dreamsorcerer avatar Feb 19 '24 19:02 Dreamsorcerer

@Dreamsorcerer shouldn't it cause a more specific ServerDisconnectedError? ClientPayloadError implies a problem with the content received over a working connection, not a connection interruption, in my opinion.

webknjaz avatar Feb 19 '24 21:02 webknjaz

Not to mention that streaming connections (including SSE) aren't intended to be closed after a short time of inactivity; that would be the job of the ClientTimeout configuration (or the server, but I don't think it's the server giving up)

DefiDebauchery avatar Feb 19 '24 21:02 DefiDebauchery

@DefiDebauchery have you tried running a tcpdump there to observe what's happening?

webknjaz avatar Feb 19 '24 21:02 webknjaz

In response to both of you, see the first line of the traceback: File "[...]/aiohttp/client_proto.py", line 94, in connection_lost

As far as I can see, the disconnect is not triggered by us and no further data is sent from the server (that function is generally called from asyncio internals after the connection is dropped). So, when it abruptly disconnects, our code tries to feed_eof() to close out the message, which results in that error.

We could maybe figure out some improvement so it does produce a ServerDisconnectedError without a parsing error, but that won't make much difference in making that application work correctly (i.e. it'd just be catching a different exception).

Dreamsorcerer avatar Feb 19 '24 21:02 Dreamsorcerer

our code tries to feed_eof() to close out the message, which results in that error.

Maybe we can tweak feed_eof() in the connection_lost() case, so it checks whether the connection is 'chunked' and if so, it feeds a closing chunk (0\r\n\r\n I think) instead.

Dreamsorcerer avatar Feb 19 '24 21:02 Dreamsorcerer

No, forget that. feed_eof() is literally just checking if the message is complete, and it's not. So, again, I think the error is correct.

It appears that the server accepts the connection, sends the headers to say it's a chunked message, waits 60 seconds, then closes the connection without sending a zero-chunked message to indicate the message is complete.

Dreamsorcerer avatar Feb 19 '24 21:02 Dreamsorcerer

ServerDisconnectedError happens after that. If the payload isn't incomplete, but the protocol hasn't received EOF (I'm not entirely clear what the difference is...) then it throws ServerDisconnectedError.

Dreamsorcerer avatar Feb 19 '24 21:02 Dreamsorcerer

So this is the output I get when using my above script through strace strace -f -e trace=network -s 10000 python3 sse.py

Unfortunately, I'm not super well-versed in such low level connection data to make sense of this, but adding in case someone is.

[Setup packets]
[-- ~60s of nothing --]
[pid 184817] recvfrom(6, "\27\3\3\0\23\24\357\215}\272\270\303\3S\225\225j\4\33n}\375\244\6", 262144, 0, NULL, NULL) = 24
strace: Process 184823 attached
[pid 184818] +++ exited with 0 +++
[pid 184823] sendto(5, "\0", 1, 0, NULL, 0) = 1
[pid 184823] +++ exited with 0 +++
recvfrom(4, "\0", 4096, 0, NULL, NULL)  = 1
recvfrom(4, 0x55b77e5e8270, 4096, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)

Traceback (most recent call last):
  File "[]/aiohttp/client_proto.py", line 94, in connection_lost
    uncompleted = self._parser.feed_eof()
  File "aiohttp/_http_parser.pyx", line 507, in aiohttp._http_parser.HttpParser.feed_eof
aiohttp.http_exceptions.TransferEncodingError: 400, message:
  Not enough data for satisfy transfer length header.

DefiDebauchery avatar Feb 19 '24 21:02 DefiDebauchery

We're seeing this on AWS after downloading a load of objects off of S3. It reproduces very infrequently and unreliably there. We've worked around it with some additional retry logic.

The SSE script posted above by @DefiDebauchery is a perfect reproducer. It always happens after exactly 15 minutes. Looks like it is the server which is closing the connection (15 minutes of events? no more soup for you!). I confirmed with some debug tracing that the initiator of the crash is the socket being closed by the peer (going into _SelectorSocketTransport::connection_lost and associated machinery and then ending up in one or other of these two legs of this if-statement in HttpPayloadParser::feed_eof() (aiohttp/http_parser.py):

        elif self._type == ParseState.PARSE_LENGTH:
            raise ContentLengthError(
                "Not enough data for satisfy content length header."
            )
        elif self._type == ParseState.PARSE_CHUNKED:
            raise TransferEncodingError(
                "Not enough data for satisfy transfer length header."
            )

and tcpdump confirms peer closes connection:

10:19:08.938320 IP 103.102.166.224.https > 192.168.14.143.35994: Flags [P.], seq 39185783:39187076, ack 811, win 64240, length 1293
10:19:08.938784 IP 192.168.14.143.35994 > 103.102.166.224.https: Flags [.], ack 39187076, win 65535, length 0
10:19:08.998630 IP 103.102.166.224.https > 192.168.14.143.35994: Flags [FP.], seq 39187076:39187100, ack 811, win 64240, length 24
10:19:08.999275 IP 192.168.14.143.35994 > 103.102.166.224.https: Flags [P.], seq 811:835, ack 39187101, win 65535, length 24
10:19:08.999821 IP 192.168.14.143.35994 > 103.102.166.224.https: Flags [F.], seq 835, ack 39187101, win 65535, length 0
10:19:08.999939 IP 103.102.166.224.https > 192.168.14.143.35994: Flags [.], ack 835, win 64240, length 0
10:19:09.000269 IP 103.102.166.224.https > 192.168.14.143.35994: Flags [.], ack 836, win 64239, length 0

With a bit of tracing I determined that, upon entry to HttpPayloadParser::feed_eof(), we have:

self._type == ParseState.PARSE_CHUNKED
isinstance(self.payload, aiohttp.streams.StreamReader)
str(self.payload) == '<StreamReader w=<Future pending cb=[Task.task_wakeup()]>>'
self.payload._size == 0
self.payload.total_bytes == 37914638  # (or whatever)
self.payload.buffer == deque([])

I think this safely confirms the theory that this is the server closing the connection without a zero sized chunk, so I think @Dreamsorcerer is correct on this one.

gtedesco-r7 avatar Mar 09 '24 03:03 gtedesco-r7