aiohttp icon indicating copy to clipboard operation
aiohttp copied to clipboard

Received "Response payload is not completed" when reading response

Open pablogsal opened this issue 5 years ago • 76 comments
trafficstars

🐞 Describe the bug

A ClientPayloadError: Response payload is not completed exception is raised when reading the response of a GET request from the GitHub REST API (this should not be especially relevant). The response seems correct and using curl succeeds all the time.

💡 To Reproduce

Do lots of requests against an endpoint of the GitHub API (it happens from time to time).

💡 Expected behavior The response is correctly parsed.

📋 Logs/tracebacks

  File "/opt/lib/python3.8/site-packages/gidgethub/aiohttp.py", line 20, in _request
    return response.status, response.headers, await response.read()
  File "/opt/lib/python3.8/site-packages/aiohttp/client_reqrep.py", line 973, in read
    self._body = await self.content.read()
  File "/opt/lib/python3.8/site-packages/aiohttp/streams.py", line 358, in read
    block = await self.readany()
  File "/opt/lib/python3.8/site-packages/aiohttp/streams.py", line 380, in readany
    await self._wait('readany')
  File "/opt/lib/python3.8/site-packages/aiohttp/streams.py", line 296, in _wait
    await waiter
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed

Some contextual information at the time of the exception at aiohttp/streams.py", line 358, in read:

blocks =  [b'[\n  {\n    "id": 2941521,\n    "node_id": "MDU6TGFiZWwyOTQxNTIx",\n    "url": "https://REDACTED_GITHUB_URL/repos/testorg/test-debian/labels/skip-issue",\n    "name": "skip-issue",\n    "color": "000000",\n    "default": false\n  }\n]\n']

n=-1

As you can see the blocks contain the whole payload (the full JSON) but aiohttp still complains about the payload not being completed.

📋 Your version of the Python

$ python --version
Python 3.8.1

📋 Your version of the aiohttp/yarl/multidict distributions

$ python -m pip show aiohttp
Name: aiohttp
Version: 3.6.2
Summary: Async http client/server framework (asyncio)
Home-page: https://github.com/aio-libs/aiohttp
Author: Nikolay Kim
Author-email: [email protected]
License: Apache 2
Location: .../versions/3.8-dev/envs/medusa/lib/python3.8/site-packages
Requires: async-timeout, attrs, chardet, multidict, yarl
Required-by: raven-aiohttp, pytest-aiohttp, aiojobs
$ python -m pip show multidict
Name: multidict
Version: 4.7.3
Summary: multidict implementation
Home-page: https://github.com/aio-libs/multidict
Author: Andrew Svetlov
Author-email: [email protected]
License: Apache 2
Location: .../versions/3.8-dev/envs/medusa/lib/python3.8/site-packages
Requires:
Required-by: yarl, aiohttp
$ python -m pip show yarl
Name: yarl
Version: 1.4.2
Summary: Yet another URL library
Home-page: https://github.com/aio-libs/yarl/
Author: Andrew Svetlov
Author-email: [email protected]
License: Apache 2
Location: /Users/pgalindo3/.pyenv/versions/3.8-dev/envs/medusa/lib/python3.8/site-packages
Requires: idna, multidict
Required-by: aiohttp

📋 Additional context

I suspect this has to be with this comment:

https://github.com/aio-libs/aiohttp/blob/1740c99159176344b02006446b4302d0f92f9ac2/aiohttp/streams.py#L352-L356

as this is where is failing and the situation seems that is the one described in the TODO.

pablogsal avatar Feb 17 '20 15:02 pablogsal

CC: @socketpair @asvetlov as the TODO was added in https://github.com/aio-libs/aiohttp/commit/5c4cb823b35fb452c67fb8070c417df5980c2fd8

pablogsal avatar Feb 17 '20 15:02 pablogsal

I'm curious if it's somehow related to #4549.

webknjaz avatar Feb 18 '20 23:02 webknjaz

I'm curious if it's somehow related to #4549.

I encountered these two problems at the same time. #4549 #4581 Python: 3.8.4 aiohttp: 3.6.2

I think there do have some relaionship. And I think the core conditions are frequent visits to a single domain name.

WH-2099 avatar Jul 20 '20 12:07 WH-2099

CC: @socketpair @asvetlov as the TODO was added in https://github.com/aio-libs/aiohttp/commit/5c4cb823b35fb452c67fb8070c417df5980c2fd8

Change the while to if seems no help.

WH-2099 avatar Jul 21 '20 01:07 WH-2099

Hi there, I've received this error too. My use case is a reverse-proxy in front of gunicorn. The other reports of this instance on this repo appear to be poorly behaved HTTP servers, which I think is a separate issue.

My code streams the proxy body back to upstream, using code roughly like:

response = web.StreamingResponse(headers=response_headers, status=status)
async for chunk in gunicorn_response.content.iter_chunked(64 * 1_024):
   await response.write(chunk)
await response.write_eof()
return response

What seems to be happening is that gunicorn returns a Connection: close header and then there is a race condition between aiohttp reading the body and the connection being closed. If I get the data out of aiohttp in time, it works, but sometimes this stack trace is triggered.

Investigation:

  • Calling await gunicorn_response.content.wait_eof() prior to calling iter_chunked reduces the instance of the error, but it still happens occasionally.
  • Setting the Connection: keep-alive header in the reverse proxy's request to gunicorn resolves this issue. This is an acceptable workaround for me.
  • Changing the while to an if in the source code in conjunction with calling wait_eof() prior to iterating over the body resolves the issue.

tildedave avatar Jun 23 '21 13:06 tildedave

we have this happening between an aiohttp 3.8.1 client + server in production, trying to reproduce

thehesiod avatar Feb 04 '22 01:02 thehesiod

I've just experienced the same issue via s3fs, downloading a few hundred files from S3.

Python: 3.7.12 aiohttp: 3.8.1 yarl: 1.7.2 multidict: 5.2.0

Traceback (most recent call last):
  File "dataset_io.py", line 336, in download
    filesystem.get(self._location, self._download_location, recursive=True)
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 91, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 71, in sync
    raise return_result
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 25, in _runner
    result[0] = await coro
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 529, in _get
    coros, batch_size=batch_size, callback=callback
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 249, in _run_coros_in_chunks
    await asyncio.gather(*chunk, return_exceptions=return_exceptions),
  File "/root/.pyenv/versions/3.7.12/lib/python3.7/asyncio/tasks.py", line 414, in wait_for
    return await fut
  File "/.venv/lib/python3.7/site-packages/s3fs/core.py", line 1002, in _get_file
    chunk = await body.read(2 ** 16)
  File "/.venv/lib/python3.7/site-packages/aiobotocore/response.py", line 53, in read
    chunk = await self.__wrapped__.read(amt if amt is not None else -1)
  File "/.venv/lib/python3.7/site-packages/aiohttp/streams.py", line 385, in read
    await self._wait("read")
  File "/.venv/lib/python3.7/site-packages/aiohttp/streams.py", line 304, in _wait
    await waiter
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed

Sorry I don't have much more information here, other than the fact that this is occurring on Python 3.7, whereas #4549 has been described as occurring on Python 3.8 only.

tharradine avatar Feb 10 '22 23:02 tharradine

I got this error on Python 3.9 with aiohttp 3.8.1 on Windows. I haven't been able to reproduce.

My server failed around the same time as the error. The backoff library re-ran the request, and I got the ClientPayloadError. Is this expected? Shouldn't losing the server raise ConnectionError instead?

a-gn avatar May 12 '22 16:05 a-gn

I have the same issue with Python 3.10

aralroca avatar Jun 20 '22 15:06 aralroca

Same error here!

joelcorporan avatar Jul 26 '22 22:07 joelcorporan

I've just experienced the same issue via s3fs, downloading a few hundred files from S3.

Python: 3.7.12 aiohttp: 3.8.1 yarl: 1.7.2 multidict: 5.2.0

Traceback (most recent call last):
  File "dataset_io.py", line 336, in download
    filesystem.get(self._location, self._download_location, recursive=True)
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 91, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 71, in sync
    raise return_result
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 25, in _runner
    result[0] = await coro
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 529, in _get
    coros, batch_size=batch_size, callback=callback
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 249, in _run_coros_in_chunks
    await asyncio.gather(*chunk, return_exceptions=return_exceptions),
  File "/root/.pyenv/versions/3.7.12/lib/python3.7/asyncio/tasks.py", line 414, in wait_for
    return await fut
  File "/.venv/lib/python3.7/site-packages/s3fs/core.py", line 1002, in _get_file
    chunk = await body.read(2 ** 16)
  File "/.venv/lib/python3.7/site-packages/aiobotocore/response.py", line 53, in read
    chunk = await self.__wrapped__.read(amt if amt is not None else -1)
  File "/.venv/lib/python3.7/site-packages/aiohttp/streams.py", line 385, in read
    await self._wait("read")
  File "/.venv/lib/python3.7/site-packages/aiohttp/streams.py", line 304, in _wait
    await waiter
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed

Sorry I don't have much more information here, other than the fact that this is occurring on Python 3.7, whereas #4549 has been described as occurring on Python 3.8 only.

I am also getting that error using iter_chunked when downloading files from S3

junbaibai0719 avatar Sep 08 '22 01:09 junbaibai0719

Every time I download the last few files, this error message appears

Mashuai1314 avatar Dec 09 '22 07:12 Mashuai1314

Hi, is there any fix for this? Appreciate it.

gelodefaultbrain avatar Apr 20 '23 14:04 gelodefaultbrain

My services occasionally crash with this error, Python 3.10.12 and Name: aiohttp Version: 3.8.5.

What are our obstacles in isolating this issue, before we can fix it?

I'll ensure I hold on to the logs the next time it arises, do we know what's causing it?

Thanks!

TheOnlyWayUp avatar Aug 21 '23 14:08 TheOnlyWayUp

What are our obstacles in isolating this issue, before we can fix it?

Not sure. Maybe we can figure out a message that causes the problem, or maybe it needs multiple steps to trigger.. It looks like it might be related to #5238, so maybe we can get that PR updated and merged.

Dreamsorcerer avatar Aug 21 '23 20:08 Dreamsorcerer

I'll try that PR out, will report if something breaks

TheOnlyWayUp avatar Aug 26 '23 07:08 TheOnlyWayUp

@Dreamsorcerer @TheOnlyWayUp Did you guys seem to figure this out? May be a session setting issue wrt stale connections. I am running into this issue with langchainjs which uses openAI's python client.

ShantanuNair avatar Sep 11 '23 10:09 ShantanuNair

All discussion is above, so feel free to follow up. PR #5238 could be worked on to get it merged, or if you can figure out a test (that runs in our CI) to reproduce the issue, then that would be a great help.

Dreamsorcerer avatar Sep 11 '23 12:09 Dreamsorcerer

I started to have this issue with 3.8.6 on pypy 3.9.18 consistently on CI, 80% of the time on my local box for a very specific streaming unittest (so I can trigger it quite easily with a simple pytest run). It was working fine on 3.8.5.

Funny thing is that reading your ChangeLog, I saw the change for lenient parsing in llhttp and the -X dev mode that does the parsing differently, so out of ideas, I tried to run pytest with python -X dev -m pytest mytest, and suddenly the error is gone. If I remove again the -X dev, the error is back. Strangely, this works great on any other Python version.

Hope this helps somehow.

Stacktrace is consistent with everyone in this thread

../../../venvpypy/lib/pypy3.9/site-packages/aiohttp/client_reqrep.py:1043: in read
    self._body = await self.content.read()
../../../venvpypy/lib/pypy3.9/site-packages/aiohttp/streams.py:375: in read
    block = await self.readany()
../../../venvpypy/lib/pypy3.9/site-packages/aiohttp/streams.py:397: in readany
    await self._wait("readany")
../../../venvpypy/lib/pypy3.9/site-packages/aiohttp/streams.py:304: in _wait
    await waiter
/usr/lib/pypy3.9/asyncio/futures.py:284: in __await__
    yield self  # This tells Task to wait for completion.
/usr/lib/pypy3.9/asyncio/tasks.py:328: in __wakeup
    future.result()

lmazuel avatar Oct 20 '23 23:10 lmazuel

I tried to run pytest with python -X dev -m pytest mytest, and suddenly the error is gone. If I remove again the -X dev, the error is back.

Hmm, that's surprising. I wonder if strict parsing is causing the connection to be get closed without raising an error somehow, thus forcing it to open a new connection...

Dreamsorcerer avatar Oct 20 '23 23:10 Dreamsorcerer

I can confirm this on python 3.11/aiohttp 3.8.6

If I downgrade to 3.8.5 I don't have a problem if I do:

if (method == "GET):
   await session.get(....)
if (method == "POST")
  await session.post(...)

strangely, if I just do just session.request(method...), I get the error on 3.8.5 as well.

rlippmann avatar Nov 09 '23 18:11 rlippmann

The possible fix (depending on what really is the root cause) is in 3.9.0b1, so please try to reproduce with that.

Dreamsorcerer avatar Nov 09 '23 19:11 Dreamsorcerer

Nope, still failing.

rlippmann avatar Nov 09 '23 21:11 rlippmann

Then someone will need to do some more debugging and/or figure out a reliable test that we can use to reproduce the behaviour.

Dreamsorcerer avatar Nov 10 '23 16:11 Dreamsorcerer

Another data point: this is happening to our Octomachinery-based bots (Chronographer and Patchback), which are built on top of gidgethub+aiohttp, just like in the Pablo's traceback. It's happening under CPython 3.9.13 on the quay.io/fedora/python-39:latest image with aiohttp 3.6.2. The bots are running Sentry, so it's possible to look a bit deeper into the tracebacks: https://github.com/sanitizers/octomachinery/issues/59. Upon inspecting them, I noticed that they are related to GET and PATCH requests to the GitHub APIs. Nothing really useful. Judging from the other hints in this issue, this might be reproducible in slow envs, like with high network latency or PyPy (which is often slower closer to its startup, before the JIT warms up). Back to gidgethub, it uses aiohttp.ClientSession.request() — https://github.com/gidgethub/gidgethub/blob/7474cae/gidgethub/aiohttp.py#L19, which is in line with one of the comments above suggesting that calling dedicated methods like .get(), .patch() instead of .request() might yield a differing behavior.

Anyway, having Sentry in place, we could probably make some educated guesses and instrument my bots to log more things. If anybody has ideas, I'm all ears...

webknjaz avatar Nov 29 '23 16:11 webknjaz

Comparing .get(), .patch() and .request(), I concluded that they are virtually equivalent and haven't changed in through versions. So your @rlippmann observation must be caused by something else. Also note that all of these methods return async context managers and should be used through async with and not await, for proper cleanup of the underlying response object.

webknjaz avatar Nov 29 '23 17:11 webknjaz

So the exception is set here: https://github.com/aio-libs/aiohttp/blob/213d1b22d42ce8efc54d7858b490e920fcdb4f0a/aiohttp/client_proto.py#L99.

I wonder if one of these tests is a reproducer (when the exception processing is unsuppressed):

  • https://github.com/aio-libs/aiohttp/blob/213d1b22d42ce8efc54d7858b490e920fcdb4f0a/tests/test_web_sendfile_functional.py#L951-L989
  • https://github.com/aio-libs/aiohttp/blob/213d1b22d42ce8efc54d7858b490e920fcdb4f0a/tests/test_client_functional.py#L1897-L1913
  • https://github.com/aio-libs/aiohttp/blob/213d1b22d42ce8efc54d7858b490e920fcdb4f0a/tests/test_web_server.py#L109-L130

webknjaz avatar Nov 30 '23 05:11 webknjaz

So the exception is set here:

https://github.com/aio-libs/aiohttp/blob/213d1b22d42ce8efc54d7858b490e920fcdb4f0a/aiohttp/client_proto.py#L99

That code suppresses the actual exception. I think we need to atleast log the original exception, or ideally parent the original exception (like raise ... from ...) somehow.

Dreamsorcerer avatar Nov 30 '23 13:11 Dreamsorcerer

That code suppresses the actual exception. I think we need to atleast log the original exception, or ideally parent the original exception (like raise ... from ...) somehow.

Yes, I was also thinking that.

webknjaz avatar Nov 30 '23 13:11 webknjaz

https://github.com/aio-libs/aiohttp/blob/213d1b22d42ce8efc54d7858b490e920fcdb4f0a/tests/test_web_sendfile_functional.py#L951-L989

When I injected a breakpoint here under the except block, and modified client_proto.py with the causing exception details, I extracted this when I ran the specific test with pytest -s:

(Pdb) pp traceback.print_exc()
Traceback (most recent call last):
  File "~/src/github/aio-libs/aiohttp/aiohttp/client_proto.py", line 95, in connection_lost
    uncompleted = self._parser.feed_eof()
                  ^^^^^^^^^^^^^^^^^^^^^^^
  File "aiohttp/_http_parser.pyx", line 510, in aiohttp._http_parser.HttpParser.feed_eof
    raise ContentLengthError(
aiohttp.http_exceptions.ContentLengthError: 400, message:
  Not enough data for satisfy content length header.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "~/src/github/aio-libs/aiohttp/tests/test_web_sendfile_functional.py", line 985, in test_static_file_huge_cancel
    data += await resp.content.read(1024)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/src/github/aio-libs/aiohttp/aiohttp/streams.py", line 347, in read
    raise self._exception
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed: <ContentLengthError: 400, message='Not enough data for satisfy content length header.'>
None

webknjaz avatar Nov 30 '23 13:11 webknjaz