aiohttp
                                
                                 aiohttp copied to clipboard
                                
                                    aiohttp copied to clipboard
                            
                            
                            
                        Task exception was never retrieved on v3.8.3
Describe the bug
We bumped from v3.8.1 to v3.8.3 this morning and started seeing the following errors. No other changes were done to our project:
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/aiohttp/web_protocol.py", line 505, in start
    request = self._request_factory(message, payload, self, writer, handler)
  File "/usr/local/lib/python3.9/site-packages/aiohttp/web_app.py", line 446, in _make_request
    return _cls(
  File "/usr/local/lib/python3.9/site-packages/aiohttp/web_request.py", line 811, in __init__
    super().__init__(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/aiohttp/web_request.py", line 188, in __init__
    assert transport is not None
AssertionError
And
Task exception was never retrieved
future: <Task finished name='Task-30396' coro=<RequestHandler.start() done, defined at /usr/local/lib/python3.9/site-packages/aiohttp/web_protocol.py:462> exception=AssertionError()>
To Reproduce
- We were using v3.8.1
- Bump to v3.8.3
No other changes introduced from our side, those errors started appearing.
Expected behavior
For that exception to be handled.
Logs/tracebacks
Added them on the description, adding them here as well:
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/aiohttp/web_protocol.py", line 505, in start
    request = self._request_factory(message, payload, self, writer, handler)
  File "/usr/local/lib/python3.9/site-packages/aiohttp/web_app.py", line 446, in _make_request
    return _cls(
  File "/usr/local/lib/python3.9/site-packages/aiohttp/web_request.py", line 811, in __init__
    super().__init__(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/aiohttp/web_request.py", line 188, in __init__
    assert transport is not None
AssertionError
And
Task exception was never retrieved
future: <Task finished name='Task-30396' coro=<RequestHandler.start() done, defined at /usr/local/lib/python3.9/site-packages/aiohttp/web_protocol.py:462> exception=AssertionError()>
Python Version
$ python --version
3.9.13
aiohttp Version
$ python -m pip show aiohttp
3.8.3
multidict Version
$ python -m pip show multidict
6.0.2
yarl Version
$ python -m pip show yarl
1.8.1
OS
Docker image
Related component
Server
Additional context
I have added as much context as I had, if I left out anything relevant, happy to look for it and add it to the issue.
Code of Conduct
- [X] I agree to follow the aio-libs Code of Conduct
Initial investigation of the exception not retrieved: The task is created here: https://github.com/aio-libs/aiohttp/blob/master/aiohttp/web_protocol.py#L305 Then it looks like it should get awaited at: https://github.com/aio-libs/aiohttp/blob/master/aiohttp/web_protocol.py#L288
If an exception is raised, then .done() would be True, so the await doesn't happen. My initial thought is that there should not be a .done() check on that code.
Not investigated anything about the assertion error though.
I've also just recently seen the exact error described by @w0rmr1d3r pop up, seemingly at random. Will report back if we're able to get any more clarity or fixes on our end. Thanks everyone for looking :)
Same issue after updating to v3.8.3
Same issue after updating to v3.8.3
Indeed @doublex . This started appearing in our services when upgrading to v3.8.3 It doesn't happen with v3.8.1
There aren't too many changes (if you ignore formatting/typing changes), so if anyone can take a look and figure out which change is the issue, that'd be great.
These 2 removed lines are the only ones that stand out to me, so maybe try adding them back in first: https://github.com/aio-libs/aiohttp/compare/v3.8.1...v3.8.3#diff-2126b277e07e3fdd05e7a81da456accf24e5515a46c78c48a79db4eb166043e4L302-L303
Hi @Dreamsorcerer ,
Thank you for providing the diff between the affected versions.
I'm reading through the code and trying to understand. But you will have more context and knowledge on how this library works. My guessings:
File web_protocol.py on line 505, will always pass the handler as None. Since it's set on line 472 to be the self._task_handler. Which on line 305 is set as None, and the change on the highlighted lines never cancels the task_handler.
Still, I lack complete context of the code, just writing what I see.
Then, in the web_app.py file, line 437 -> 454, protocol will be a RequestHandler, but is received as None, since as seen in this line: https://github.com/aio-libs/aiohttp/blob/8cf01adc8c8dbf706e4cd33bf89fd5195f638715/aiohttp/web_protocol.py#L51 it can be "asyncio.Task[None]".
I guess, that is setting it to None, somehow?
Again, just tried to read through the code, but have 0 context on how the library works.
My guesses might be completely wrong, apologies for that.
File web_protocol.py on line 505, will always pass the handler as None. Since it's set on line 472 to be the self._task_handler.
Line 473 asserts it to be not None, so that's not possible.
My suggestion was to try adding back in the lines 302-303 that were deleted and see if the problem is still reproducible. If it is, then maybe checkout the code at different commits until you find the commit that introduces the problem (you could use git bisect for this). Also remember to do this on the 3.9 branch, not master.
Hello @Dreamsorcerer , Sorry for my late response.
So, what you mean is, adding back those lines in the next release and see if the issue is fixed? I understand that to be version 3.9.0? And if those lines make the issue go away, re-review the commits on branch 3.9?
Just wanted to clarify and get right the steps on this.
Thank you for providing an answer and maintain this project 🚀
Yes, just clone the 3.9 branch, add those lines in, then test if the issue is fixed (also check the issue is reproducible before adding the line in). If so, make a PR and we'll look at releasing it in 3.9.
Hello @Dreamsorcerer ,
I haven't had time to put effort on this.
We received the dependabot bump PR for 3.8.4 and saw the same error.
Hopefully in the future I might be able to put some effort on this to debug it or something.
3.8.4 only added a moderate severity fix. If you can figure out this issue, we'll put it into the 3.9 release.
Just here to confirm that we can see this issue happening in python 3.9, 3.10 and 3.11 with aiohttp 3.8.4
I can also confirm that this issue occurs in our project with aiohttp 3.8.3 and Python 3.11.
There aren't too many changes (if you ignore formatting/typing changes), so if anyone can take a look and figure out which change is the issue, that'd be great. These 2 removed lines are the only ones that stand out to me, so maybe try adding them back in first: https://github.com/aio-libs/aiohttp/compare/v3.8.1...v3.8.3#diff-2126b277e07e3fdd05e7a81da456accf24e5515a46c78c48a79db4eb166043e4L302-L303
@Dreamsorcerer I tried reverting these lines on top of v3.8.3 and this does not fix the issue. They were also the only lines standing out to me in the git history, so I'll need to take a deeper look.
Looking at the logic, it seems like this originates from the connection_lost path: the transport is set to None early in BaseProtocol.connection_lost(). I could not reproduce this issue locally though so no way to tell for sure. I didn't see anything out of the ordinary in the code, so maybe the issue is somewhere else, ex: some level of concurrency happening between connection_lost() and the request handler?
My setup:
- Python 3.11
- aiohttp 3.8.3
- gunicorn + uvloop workers.
@Dreamsorcerer I tried reverting these lines on top of v3.8.3 and this does not fix the issue. They were also the only lines standing out to me in the git history, so I'll need to take a deeper look.
If you can run it from the git repo, then a git bisect or something could help find the commit which introduced the issue.
@Dreamsorcerer I tried reverting these lines on top of v3.8.3 and this does not fix the issue. They were also the only lines standing out to me in the git history, so I'll need to take a deeper look.
If you can run it from the git repo, then a
git bisector something could help find the commit which introduced the issue.
I'd love to, but I have no idea how to build the package properly to test inside a Docker image (only solution I have to reproduce the issue at the moment). I've looked at the docs but I couldn't find a tip, I always end up with "aiohttp/_websocket.c: File not found". Any idea?
In the meantime, I realized I hadn't installed my patched version correctly. It appears that applying the patch below as suggested fixes the issue.
diff --git a/aiohttp/web_protocol.py b/aiohttp/web_protocol.py
index 10a96080..f103b167 100644
--- a/aiohttp/web_protocol.py
+++ b/aiohttp/web_protocol.py
@@ -299,6 +299,8 @@ class RequestHandler(BaseProtocol):
                 exc = ConnectionResetError("Connection lost")
             self._current_request._cancel(exc)
 
+        if self._task_handler is not None:
+            self._task_handler.cancel()
         if self._waiter is not None:
             self._waiter.cancel()
Now, I saw in the commit history that this patch has already been applied, removed and reapplied (cf. the discussion here: https://github.com/aio-libs/aiohttp/issues/6719). What's the best course of action? I don't see cancellation exceptions in my setup, maybe the whole issue discussed in #6719 has been fixed in the meantime.
Now, I saw in the commit history that this patch has already been applied, removed and reapplied (cf. the discussion here: #6719). What's the best course of action?
OK, I've got a vague idea on what the cause may be now. Can you try removing the  and not self._task_handler.done() (not on a production server though, I expect it to break):
https://github.com/aio-libs/aiohttp/blob/master/aiohttp/web_protocol.py#L287
My theory is that on connection_lost on older releases (and optionally in 3.9) we cancel the handler. Now we are not cancelling, I suspect there may be an exception occurring on self.transport.close() or something similar, because the connection was already closed:
https://github.com/aio-libs/aiohttp/blob/master/aiohttp/web_protocol.py#L610
So, I'm hoping that by making that change, instead of an exception was not retrieved warning, you'll get the actual exception and then we can catch that error in future.
This also suggests that we may be able to build an actual test, if anyone wants to give that a go? Hopefully, it just means we need an endpoint that takes a couple of seconds and then have a client disconnect prematurely to trigger the warning.
I've tried replicating this issue in a test, but it passes just fine on the 3.8 branch. Am I approaching this in a wrong way?
async def test_client_connection_lost(aiohttp_client: Any) -> None:
    async def handler(_):
        await asyncio.sleep(2)
        return web.Response()
    app = web.Application()
    app.router.add_route("GET", "/", handler)
    client = await aiohttp_client(app)
    with pytest.raises(TimeoutError):
        await asyncio.wait_for(
            client.get("/"),
            timeout=1,
        )
    await client.close()
I've set a breakpoint in BaseProtocol.connection_lost and saw it being called twice, once for the client, then another time for the server.
I've never managed to reproduce it, which is why I keep asking others to test/debug.
I am currently facing this issue, I do not think it's easily reproduced in a test as you need to send some sort of client/server disconnection into the request/response handler.
Task exception was never retrieved
future: <Task finished name='Task-70887' coro=<RequestHandler.start() done, defined at /opt/venv/lib/python3.10/site-packages/aiohttp/web_protocol.py:462> exception=AssertionError()>
I will go down back to 3.8.1.
If my theory is right, then we should be able to write a test. But, I'm still waiting on someone to confirm the cause by hopefully getting the traceback as described in https://github.com/aio-libs/aiohttp/issues/6978#issuecomment-1520627194
I have been pretty busy since April but I'll try to reproduce the issue in the weeks to come.
OK, I played around with the code, manually adding in exceptions etc. and I couldn't figure out any way to reproduce this. Hopefully, #7333 will help, but I don't think there's anything else I can do unless someone figures something else out.
Once #7333 is released I can definitely give it a try.
Still happening for v3.8.5 which includes #7333
Continuing the conversation that started here.
I think this ticket is the correct one for my case given that the traceback is the same here.
- The issue reproduces relatively rare (up to 10 times per day) in the prod env, it could happen with any end point randomly.
- To test @Dreamsorcerer your suggestions, I need to deploy a custom version of aiohttp to the cloud (GAE), but whatever I tried I couldn't make GAE to use my modified version of the library and it either continued to use the version from PyPI or fail to run.
- So instead I ran a local env with the modified version of the library. Then wrote a browser script that from 4 separate tabs triggered 4 different end points every 10 ms. Unfortunately even after about 40K requests, not a single time I got the exception and accordingly self._force_close, self._closewere alwaysFalseand nothing broke from doing this comment neither.
Is there any way that I can add additional logs to test this in the current version of the library that I use in prod (3.8.5)? Maybe call some aiohttp method and print something upon any end point being hit?
Appreciate your help with trying to figure this out.
2. To test @Dreamsorcerer your suggestions, I need to deploy a custom version of aiohttp to the cloud (GAE), but whatever I tried I couldn't make GAE to use my modified version of the library and it either continued to use the version from PyPI or fail to run.
I'm not failiar with GAE, but can you just drop aiohttp into the project directory, where it loads your app from?
Is there any way that I can add additional logs to test this in the current version of the library that I use in prod (3.8.5)? Maybe call some aiohttp method and print something upon any end point being hit?
I can't think of an easy way. You could try and monkey patch the library, but you'd probably have to copy the entire function (e.g. .start() method), add the print in, and then assign it to the aiohttp class.
I'm not failiar with GAE, but can you just drop aiohttp into the project directory, where it loads your app from?
Or failing that, copy into the app's directory as a module, and then change the imports to from myapp import aiohttp, if necessary (assuming that doesn't break aiohttp's imports).