filesystem_spec icon indicating copy to clipboard operation
filesystem_spec copied to clipboard

Reading data from `raw.githubusercontent.com` hangs

Open nenb opened this issue 1 year ago • 2 comments

What

When using the GitHub fsspec implementation, an attempt to open(...) data at raw.githubusercontent.com hangs indefinitely on my local machine.

Why

My DNS server returns both IPv4 and IPv6 addresses for raw.githubusercontent.com:

> nslookup raw.githubusercontent.com <IP_OF_MY_DNS_SERVER>

...

Name:	raw.githubusercontent.com
Address: 185.199.109.133
Name:	raw.githubusercontent.com
Address: 185.199.108.133
Name:	raw.githubusercontent.com
Address: 185.199.111.133
Name:	raw.githubusercontent.com
Address: 185.199.110.133
Name:	raw.githubusercontent.com
Address: 2606:50c0:8002::154
Name:	raw.githubusercontent.com
Address: 2606:50c0:8000::154
Name:	raw.githubusercontent.com
Address: 2606:50c0:8003::154
Name:	raw.githubusercontent.com
Address: 2606:50c0:8001::154

(Note: urllib3/requests gives an identical response for me.)

The IPv4 addresses are fine. However, I'm not able to establish a connection with the IPv6 address eg

❯ ping -w 30 2606:50c0:8001::154
PING 2606:50c0:8001::154(2606:50c0:8001::154) 56 data bytes

--- 2606:50c0:8001::154 ping statistics ---
30 packets transmitted, 0 received, 100% packet loss, time 29673ms

This creates a problem when a program attempts to first connect to an IPv6 address rather than IPv4 (ie program hangs waiting to establish a connection). Some (likely unresolved) issues from GitHub community that appear relevant: Issue 1 Issue 2

Proposed fix

Introduce a timeout keyword on the filesystem instance here. This will allow a program to move forward in the event no response. This is also the recommended approach from requests. If this sounds okay, let me know and I would be happy to open a small PR.

Thanks for a cool project!

nenb avatar Dec 16 '23 10:12 nenb

Yes, I think all usage of requests could benefit from a timeout parameter. This can be fairly large, but if you allow for user override, all the better.

In the block you highlighted, note the use of **self.kw, but it doesn't appear to be populated by the kwargs passed to the init method.

martindurant avatar Dec 18 '23 15:12 martindurant

I've pushed something in #1473. Let me know what you would like changed.

nenb avatar Dec 18 '23 19:12 nenb