filesystem_spec
filesystem_spec copied to clipboard
Inconsistent use of protocol specific options
Setting protocol specific options has been a convenient method for overriding the default options for each protocol. E.g., the Azure blob storage implementation behaves peculiarly and requires setting anon=False to use the credentials in the environment (https://github.com/fsspec/adlfs/issues/348).
So for paths provided by an application, we might do:
fsspec.open(..., az={"anon": False})
This option is ignored for local paths, and used for az:// protocol urls, and therefore allows us to configure defaults for each protocol. Unfortunately, this doesn't work with https(s) protocol urls, since the kwargs are directly forwarded to aiohttp, e.g. https://github.com/fsspec/filesystem_spec/blob/561428ca18a9865d8f63fe188a590d791ec52c92/fsspec/implementations/http.py#L826
- If this is an intended usage mode: How about dropping all protocol specific kwargs before forwarding to the http implementation?
- If this is not an intended feature: How can we set per protocol defaults otherwise? If we have to manually parse the URLs and assign different kwargs to the
fsspec.openmethod the convinience of this API greatly diminishes.
Either way, I'm happy to contribute code if we can agree on a solution.
To answer part of your question, yes you can use protocol-specific arguments to configure the HTTP backend:
>>> of = fsspec.open("http://google.com", http={"encoded": True})
>>> of.fs.encoded
True
(This is exactly equivalent to fsspec.open("http://google.com", encoded=True))
The second part, of not passing options that might have been intended for other backends is undefined behaviour. I can see how it can be convenient, but
- it would be tricky to exhaustively check against all possible protocols
- the alternative would be to require those kwargs to be inside something like get_kwargs={http-get-specific-things}, which is cumbersome for the user and a breaking change.
The intended use was originally only for multi-component URLs like "simplecache::http://server/path", where we know the two protocols involved, and can find the args to send to each; any "extra" kwargs always go to the foremost component, in this case simplecache.
@martindurant Thanks for the quick reply!
I understand, neither is a great option. Perhaps the best option is doing the opposite? I.e. allowing / requiring per protocol defaults in a specific argument that only passes down kwargs to the relevant implementation? Something like:
fsspec.open(..., protocol_defaults={"az": {"anon": False}, "http": {"encoded": True}})
I realize this is a bit less convenient, but it's fairly confusing as is now with protocol-specific arguments making it down to the individual implementations.
That could be a possible solution, but we could not disallow az= directly now, as it is already in use; at least, not without a proper deprecation. I'm not convinced that the longer form would be very popular.