filesystem_spec icon indicating copy to clipboard operation
filesystem_spec copied to clipboard

github file 404 when folder has escaped symbols

Open DaveBoy opened this issue 1 year ago • 11 comments

A 404 error may occur when a github folder has escaped symbols, because the actual raw address is not the folder name, but partially escaped. Example below: url: https ://github.com/msojocs/fiddler-everywhere-enhance/raw/master/server/file/api.getfiddler.com/c/NUNHMjIyNkpNMDg4ZjMzMjZlLTk0OWQtNDgwNi1hMTc2LTFlZGY1MWJiOWRhOA%253D%253D/v/4.6.1/p/win/latest.yml

it's folder path: server/file/api.getfiddler.com/c/NUNHMjIyNkpNMDg4ZjMzMjZlLTk0OWQtNDgwNi1hMTc2LTFlZGY1MWJiOWRhOA%3D%3D/v/4.6.1/p/win/latest.yml

in folderpath: %3D%3D in url: %253D%253D

DaveBoy avatar Sep 04 '24 13:09 DaveBoy

@DaveBoy , could you show how you are trying to use your URL?

martindurant avatar Sep 04 '24 13:09 martindurant

@DaveBoy , could you show how you are trying to use your URL?

I use it like this, it can download other files from this repo, but it can't download files under special folders

user = "msojocs"
project = "fiddler-everywhere-enhance"
username = "DaveBoy"
token = "my token"
down_path = "server/file/api.getfiddler.com/c"
save_path = r"D:\temp\fiddler-everywhere-enhance"
branch = "master"
down_type = "tree"
destination = Path(save_path)
destination.mkdir(exist_ok=True, parents=True)
if os.path.exists(save_path + r"\exception.txt"):
    os.remove(save_path + r"\exception.txt")
try:
    fs = fsspec.filesystem("github", org=user, repo=project, username=username,
                           sha=branch if len(branch) != 0 else None,
                           token=token)
    if down_type == "blob":
        fs.get(down_path, (destination / down_path).as_posix())
    else:
        fs.get(down_path, destination.as_posix(), recursive=True)
except Exception as e:
    traceback.print_exc()
    info = traceback.format_exc()
    with open(save_path + r"\exception.txt", 'w') as f:
        f.write(info)

DaveBoy avatar Sep 04 '24 13:09 DaveBoy

Ah, OK. Probably in fsspec.implementations.github , the url passed to requests in _open, ls and cat should be url-encoded . Would you like to try this?

martindurant avatar Sep 04 '24 13:09 martindurant

Ah, OK. Probably in fsspec.implementations.github , the url passed to requests in _open, ls and cat should be url-encoded . Would you like to try this?

tks,i will test this plan tomorrom

ps:i found this api can get the right url,But I'm not sure it's a better plan than this.

DaveBoy avatar Sep 04 '24 13:09 DaveBoy

Ah, OK. Probably in fsspec.implementations.github , the url passed to requests in _open, ls and cat should be url-encoded . Would you like to try this?

tks very much.I tried this solution and successfully solved my problem. My modifications are as follows,Hope to help other friends who encounter this problem (testing get and cat is normal, I am not sure whether ls needs to be modified):

Related file test address You can use the files under this address 图片

DaveBoy avatar Sep 04 '24 15:09 DaveBoy

Can you please put your changes into a PR?

martindurant avatar Sep 04 '24 15:09 martindurant

Can you please put your changes into a PR?

I'm very happy to do this, I'll sort out the relevant content and submit it to pr tomorrow

DaveBoy avatar Sep 04 '24 15:09 DaveBoy

Ah, OK. Probably in fsspec.implementations.github , the url passed to requests in _open, ls and cat should be url-encoded . Would you like to try this?

get is ok, but cat is not usable via "fs.cat(r"file/test_file/test1.txt")", so I can't test it. It raises "ValueError: too many values ​​to unpack" at "for u, sh in paths". Am I using it wrong?

DaveBoy avatar Sep 05 '24 03:09 DaveBoy

 def cat(self, path, recursive=False, on_error="raise", **kwargs):
        paths = self.expand_path(path, recursive=recursive)
        urls = [
            self.rurl.format(org=self.org, repo=self.repo, path=u, sha=self.root)
            for u, sh in paths
        ]

@martindurant Here "ValueError: too many values ​​to unpack" is raised at "for u, sh in path". expand_path always returns a list, is there any example of its use?

DaveBoy avatar Sep 08 '24 03:09 DaveBoy

for u, sh in paths seems simply wrong, it should be for u in paths as you suspected. expand_path returns "list of all matching paths".

martindurant avatar Sep 09 '24 13:09 martindurant

for u, sh in paths seems simply wrong, it should be for u in paths as you suspected. expand_path returns "list of all matching paths".

@martindurant

for u, sh in paths

it's the cat method in /implementations/github.py

Thanks for the reply, I will test the quote method after the cat method fix and submit it after the test passes

DaveBoy avatar Sep 09 '24 14:09 DaveBoy