filesystem_spec icon indicating copy to clipboard operation
filesystem_spec copied to clipboard

http get() call creates malformed dest path

Open cgrass opened this issue 1 year ago • 1 comments

See original discussion here and a linked bug here

Issue

Creating an http file system and using an rpath with query params results in a malformed destination path.

Steps to Reproduce

Using the test below, update lpath to a path you can write to.

def test_http_output():
    kwargs = {}
    fs = fsspec.implementations.http.HTTPFileSystem(fsspec.filesystem("https", **kwargs))
    expected_output_path = "/var/myloc/outputfile.txt"
    rpath = "https://httpbin.org/gzip?test=value"
    lpath = expected_output_path
   fs.get(rpath, lpath)

This behavior only happens (in my testing) when the rpath uses query params. That triggers exists to be set to True because has_magic returns True.

It also requires lpath and rpath to be strings rather than lists.

Expected Results

A file is written to /var/myloc/outputfile.txt

Actual Results

A file is written to /var/myloc/outputfile.txt/gzip\?test=value

Notes

It's also possible to test the underlying path construction using testutils.py test_other_paths and adding the following parameterized test: (["/path1"], "/path2/path3.txt", True, ["/path2/path3.txt"])

cgrass avatar Jan 16 '24 16:01 cgrass

I don't really understand the use of has_magic when determining the exists flag. it seems to pick up on the query param segment and change the behavior of the downstream (other_paths) call. without ?test=value the output is /var/myloc/outputfile.txt, but if the query is included the result is /var/myloc/outputfile.txt/gzip\?test=value; is that intended?

breaking apart the other_paths method into use-case specific implementations might allow the interface to be cleaned up (no boolean flags) and the code to be more specific. for instance, exists doesn't make sense when called from the get() flow, as the source should always exist.

cgrass avatar Jan 16 '24 21:01 cgrass