filesystem_spec
filesystem_spec copied to clipboard
http get() call creates malformed dest path
See original discussion here and a linked bug here
Issue
Creating an http file system and using an rpath with query params results in a malformed destination path.
Steps to Reproduce
Using the test below, update lpath to a path you can write to.
def test_http_output():
kwargs = {}
fs = fsspec.implementations.http.HTTPFileSystem(fsspec.filesystem("https", **kwargs))
expected_output_path = "/var/myloc/outputfile.txt"
rpath = "https://httpbin.org/gzip?test=value"
lpath = expected_output_path
fs.get(rpath, lpath)
This behavior only happens (in my testing) when the rpath uses query params. That triggers exists to be set to True because has_magic returns True.
It also requires lpath and rpath to be strings rather than lists.
Expected Results
A file is written to /var/myloc/outputfile.txt
Actual Results
A file is written to /var/myloc/outputfile.txt/gzip\?test=value
Notes
It's also possible to test the underlying path construction using testutils.py test_other_paths and adding the following parameterized test: (["/path1"], "/path2/path3.txt", True, ["/path2/path3.txt"])
I don't really understand the use of has_magic when determining the exists flag. it seems to pick up on the query param segment and change the behavior of the downstream (other_paths) call. without ?test=value the output is /var/myloc/outputfile.txt, but if the query is included the result is /var/myloc/outputfile.txt/gzip\?test=value; is that intended?
breaking apart the other_paths method into use-case specific implementations might allow the interface to be cleaned up (no boolean flags) and the code to be more specific. for instance, exists doesn't make sense when called from the get() flow, as the source should always exist.