filesystem_spec
filesystem_spec copied to clipboard
Reading data from `raw.githubusercontent.com` hangs
What
When using the GitHub fsspec implementation, an attempt to open(...) data at raw.githubusercontent.com hangs indefinitely on my local machine.
Why
My DNS server returns both IPv4 and IPv6 addresses for raw.githubusercontent.com:
> nslookup raw.githubusercontent.com <IP_OF_MY_DNS_SERVER>
...
Name: raw.githubusercontent.com
Address: 185.199.109.133
Name: raw.githubusercontent.com
Address: 185.199.108.133
Name: raw.githubusercontent.com
Address: 185.199.111.133
Name: raw.githubusercontent.com
Address: 185.199.110.133
Name: raw.githubusercontent.com
Address: 2606:50c0:8002::154
Name: raw.githubusercontent.com
Address: 2606:50c0:8000::154
Name: raw.githubusercontent.com
Address: 2606:50c0:8003::154
Name: raw.githubusercontent.com
Address: 2606:50c0:8001::154
(Note: urllib3/requests gives an identical response for me.)
The IPv4 addresses are fine. However, I'm not able to establish a connection with the IPv6 address eg
❯ ping -w 30 2606:50c0:8001::154
PING 2606:50c0:8001::154(2606:50c0:8001::154) 56 data bytes
--- 2606:50c0:8001::154 ping statistics ---
30 packets transmitted, 0 received, 100% packet loss, time 29673ms
This creates a problem when a program attempts to first connect to an IPv6 address rather than IPv4 (ie program hangs waiting to establish a connection). Some (likely unresolved) issues from GitHub community that appear relevant: Issue 1 Issue 2
Proposed fix
Introduce a timeout keyword on the filesystem instance here. This will allow a program to move forward in the event no response. This is also the recommended approach from requests. If this sounds okay, let me know and I would be happy to open a small PR.
Thanks for a cool project!
Yes, I think all usage of requests could benefit from a timeout parameter. This can be fairly large, but if you allow for user override, all the better.
In the block you highlighted, note the use of **self.kw, but it doesn't appear to be populated by the kwargs passed to the init method.
I've pushed something in #1473. Let me know what you would like changed.