filesystem_spec icon indicating copy to clipboard operation
filesystem_spec copied to clipboard

inconsistent path parsing from url_to_fs

Open jhamman opened this issue 1 year ago • 2 comments

fsspec.url_to_fs seems to be inconsistently parsing the path from urls.

import fsspec

print(fsspec.url_to_fs("s3://icechunk-test/ryan"))
print(fsspec.url_to_fs("http://earthmover.io/joe"))

(<s3fs.core.S3FileSystem object at 0x1334187a0>, 'icechunk-test/ryan')
(<fsspec.implementations.http.HTTPFileSystem object at 0x133419550>, 'http://earthmover.io/joe')

Why does the path from the http example include the scheme?

jhamman avatar Oct 13 '24 03:10 jhamman

Why does the path from the http example include the scheme?

The HTTP implementation deals transparently with http and https on the same client and connection pool. The two types or URL are only distinguishable by their protocol, and the lower-level client needs to see the whole URL to make the right call.

Conversely, s3/s3a and gs/gcs are allowed prefix aliases, but the backend doesn't use the prefix at all in the actual call to the remote store.

It might be reasonable for a backend, let's use s3 as an example, to remember that it was created with protocol "s3", and return paths as "s3://..." even when the path passed in was "s3a://..." (and vice-versa). However, this would mean a decent amount of rewriting.

Note that fs.unstrip_protocol should make full URLs.

martindurant avatar Oct 13 '24 17:10 martindurant

I think I see where you are coming from. From a users perspective though, its a bummer to have to special case the output of url_to_fs differently for the HTTP filesystems.

jhamman avatar Oct 14 '24 04:10 jhamman