filesystem_spec
filesystem_spec copied to clipboard
HTTPFileSystem prints tracebacks if skip_instance_cache=True if called more than once
I have been debugging an issue for a while and finally have a small reproducible snip. Using fsspec==2023.12.2
import fsspec
some_public_url = "http://replace.me.with.a.public.url"
def read_my_data(url: str) -> None:
for i in range(10):
fs = fsspec.filesystem("http", skip_instance_cache=True)
read_data = fs.read_text(url)
print(i, read_data)
read_my_data(some_public_url)
The output looks something like
0
1
2
Exception ignored in: <finalize object at 0x7f644203f9e0; dead>
Traceback (most recent call last):
File "/lib/python3.10/weakref.py", line 591, in __call__
return info.func(*info.args, **(info.kwargs or {}))
File "/lib/python3.10/site-packages/fsspec/implementations/http.py", line 125, in close_session
sync(loop, session.close, timeout=0.1)
File "/lib/python3.10/site-packages/fsspec/asyn.py", line 80, in sync
raise NotImplementedError("Calling sync() from within a running loop")
NotImplementedError: Calling sync() from within a running loop
Exception ignored in: <finalize object at 0x7f644203fae0; dead>
Traceback (most recent call last):
File "/python3.10/weakref.py", line 591, in __call__
return info.func(*info.args, **(info.kwargs or {}))
File "/lib/python3.10/site-packages/fsspec/implementations/http.py", line 125, in close_session
sync(loop, session.close, timeout=0.1)
File "/python3.10/site-packages/fsspec/asyn.py", line 80, in sync
raise NotImplementedError("Calling sync() from within a running loop")
NotImplementedError: Calling sync() from within a running loop
Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x7f644207e2f0>
Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x7f644207ee90>
3
4
5
done
The number of successes before this is error is printed is not consistent but usually after 2 or 3. This traceback is not fatal and the correct data is read but this is an ugly traceback to be printed which I do not know how to suppress.
If instead skip_instance_cache=False I do not see this issue. My guess is something wrong is cached internally between runs?
If instead skip_instance_cache=False I do not see this issue.
This is normal usage, so most people are not seeing this issue.
You are apparently creating and destroying HTTPFileSystem instances and their corresponding aiohttp connection pools at a faster rate than the event loop can keep up. The exception is during finalize(), when the instance gets cleaned up, and exactly when that happens can vary. That event loop is in a different thread, which is why you get the intermittent behaviour.
Maybe the following helps?
--- a/fsspec/implementations/http.py
+++ b/fsspec/implementations/http.py
@@ -123,7 +123,7 @@ class HTTPFileSystem(AsyncFileSystem):
try:
sync(loop, session.close, timeout=0.1)
return
- except (TimeoutError, FSTimeoutError):
+ except (TimeoutError, FSTimeoutError, NotImplementedError):
pass
connector = getattr(session, "_connector", None)
Yes, I think catching it there would resolve this.
Feel free to make such a PR. I don't know if you can test it reliably, since the CI's event look probably runs much slower than any local installation.