Cannot close s3fs files during interpreter shutdown
This issue is really annoying and I'm not sure if it's an issue with s3fs, aiobotocore or aiohttp. The S3File is absolutely amazing, and one use case is with a Python logging FileHandler to stream log messages to S3. This works really well unless something closes the interpreter and forces the logging atexit handler to run. This causes .close() to explode because it ends up making aiohttp try to spawn a thread to resolve the S3 endpoint which isn't allowed during an interpreter shutdown.
Basically, the following code fails and that's a real shame:
In [1]: import s3fs, atexit
In [2]: fs = s3fs.S3FileSystem()
In [6]: fd = fs.open("s3://foo/bar", "wb", bufsize=1024*1024*5)
In [7]: fd.write(b'Lol')
Out[7]: 3
In [8]: atexit.register(lambda: fd.close())
Out[8]: <function __main__.<lambda>()>
In [9]: exit()
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/Users/tom/Library/Caches/pypoetry/virtualenvs/foo-nQOGymRb-py3.9/lib/python3.9/site-packages/fsspec/asyn.py", line 25, in _runner
result[0] = await coro
File "/Users/tom/Library/Caches/pypoetry/virtualenvs/foo-nQOGymRb-py3.9/lib/python3.9/site-packages/s3fs/core.py", line 268, in _call_s3
raise err
File "/Users/tom/Library/Caches/pypoetry/virtualenvs/foo-nQOGymRb-py3.9/lib/python3.9/site-packages/s3fs/core.py", line 248, in _call_s3
out = await method(**additional_kwargs)
File "/Users/tom/Library/Caches/pypoetry/virtualenvs/foo-nQOGymRb-py3.9/lib/python3.9/site-packages/aiobotocore/client.py", line 141, in _make_api_call
http, parsed_response = await self._make_request(
File "/Users/tom/Library/Caches/pypoetry/virtualenvs/foo-nQOGymRb-py3.9/lib/python3.9/site-packages/aiobotocore/client.py", line 161, in _make_request
return await self._endpoint.make_request(operation_model, request_dict)
File "/Users/tom/Library/Caches/pypoetry/virtualenvs/foo-nQOGymRb-py3.9/lib/python3.9/site-packages/aiobotocore/endpoint.py", line 81, in _send_request
while await self._needs_retry(attempts, operation_model,
File "/Users/tom/Library/Caches/pypoetry/virtualenvs/foo-nQOGymRb-py3.9/lib/python3.9/site-packages/aiobotocore/endpoint.py", line 214, in _needs_retry
responses = await self._event_emitter.emit(
File "/Users/tom/Library/Caches/pypoetry/virtualenvs/foo-nQOGymRb-py3.9/lib/python3.9/site-packages/aiobotocore/hooks.py", line 29, in _emit
response = handler(**kwargs)
File "/Users/tom/Library/Caches/pypoetry/virtualenvs/foo-nQOGymRb-py3.9/lib/python3.9/site-packages/botocore/retryhandler.py", line 183, in __call__
if self._checker(attempts, response, caught_exception):
File "/Users/tom/Library/Caches/pypoetry/virtualenvs/foo-nQOGymRb-py3.9/lib/python3.9/site-packages/botocore/retryhandler.py", line 250, in __call__
should_retry = self._should_retry(attempt_number, response,
File "/Users/tom/Library/Caches/pypoetry/virtualenvs/foo-nQOGymRb-py3.9/lib/python3.9/site-packages/botocore/retryhandler.py", line 269, in _should_retry
return self._checker(attempt_number, response, caught_exception)
File "/Users/tom/Library/Caches/pypoetry/virtualenvs/foo-nQOGymRb-py3.9/lib/python3.9/site-packages/botocore/retryhandler.py", line 316, in __call__
checker_response = checker(attempt_number, response,
File "/Users/tom/Library/Caches/pypoetry/virtualenvs/foo-nQOGymRb-py3.9/lib/python3.9/site-packages/botocore/retryhandler.py", line 222, in __call__
return self._check_caught_exception(
File "/Users/tom/Library/Caches/pypoetry/virtualenvs/foo-nQOGymRb-py3.9/lib/python3.9/site-packages/botocore/retryhandler.py", line 359, in _check_caught_exception
raise caught_exception
File "/Users/tom/Library/Caches/pypoetry/virtualenvs/foo-nQOGymRb-py3.9/lib/python3.9/site-packages/aiobotocore/endpoint.py", line 148, in _do_get_response
http_response = await self._send(request)
File "/Users/tom/Library/Caches/pypoetry/virtualenvs/foo-nQOGymRb-py3.9/lib/python3.9/site-packages/aiobotocore/endpoint.py", line 230, in _send
return await self.http_session.send(request)
File "/Users/tom/Library/Caches/pypoetry/virtualenvs/foo-nQOGymRb-py3.9/lib/python3.9/site-packages/aiobotocore/httpsession.py", line 155, in send
resp = await self._session.request(
File "/Users/tom/Library/Caches/pypoetry/virtualenvs/foo-nQOGymRb-py3.9/lib/python3.9/site-packages/aiohttp/client.py", line 520, in _request
conn = await self._connector.connect(
File "/Users/tom/Library/Caches/pypoetry/virtualenvs/foo-nQOGymRb-py3.9/lib/python3.9/site-packages/aiohttp/connector.py", line 535, in connect
proto = await self._create_connection(req, traces, timeout)
File "/Users/tom/Library/Caches/pypoetry/virtualenvs/foo-nQOGymRb-py3.9/lib/python3.9/site-packages/aiohttp/connector.py", line 892, in _create_connection
_, proto = await self._create_direct_connection(req, traces, timeout)
File "/Users/tom/Library/Caches/pypoetry/virtualenvs/foo-nQOGymRb-py3.9/lib/python3.9/site-packages/aiohttp/connector.py", line 999, in _create_direct_connection
hosts = await asyncio.shield(host_resolved)
File "/Users/tom/Library/Caches/pypoetry/virtualenvs/foo-nQOGymRb-py3.9/lib/python3.9/site-packages/aiohttp/connector.py", line 865, in _resolve_host
addrs = await self._resolver.resolve(host, port, family=self._family)
File "/Users/tom/Library/Caches/pypoetry/virtualenvs/foo-nQOGymRb-py3.9/lib/python3.9/site-packages/aiohttp/resolver.py", line 31, in resolve
infos = await self._loop.getaddrinfo(
File "/usr/local/Cellar/[email protected]/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/asyncio/base_events.py", line 856, in getaddrinfo
return await self.run_in_executor(
File "/usr/local/Cellar/[email protected]/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/asyncio/base_events.py", line 814, in run_in_executor
executor.submit(func, *args), loop=self)
File "/usr/local/Cellar/[email protected]/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/thread.py", line 161, in submit
raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
Thanks for reporting this. I have no immediate thought on how to fix it - presumably you would like the file to be flushed to remote and properly closed before shutdown?
Ideally yes. I’ve got a merge request to allow customising the DNS resolver in aiohttp which should mean this will work fine - we only spawn a thread to resolve the S3 endpoint and there is an async resolver we can use instead.
One thing to note is that the asynchronous resolver is not the default because there are some kinds of obscure records it does not resolve correctly. I’ll read up on the specifics, but I don’t think any of these issues will apply to s3, so perhaps we’d be OK with setting it as the default resolver?
I am no. kind of expert on that! Your script demonstrates the original problem, so if that goes away... Well, it should also be tried against some custom endpoints like moto/minio.
Since actually the resolution here depends on a change in another project, should we close this issue?
From my perspective it's still a bug with s3fs: I'm happy to add a regression test for this, and make the change to use the async resolver if it's available.
If that's easy to do (not much effort from you and unintrusive for users), I'd be happy to see it.
Sure, I'll also take the time to bump aiobotocore to 2.x, which should be when my changes go in?
I guess if everything till works well with it. Big scary change!
Let's do it after the release I am preparing right now.