pyfilesystem2 icon indicating copy to clipboard operation
pyfilesystem2 copied to clipboard

Feature request: directly open a file url?

Open longern opened this issue 6 years ago • 20 comments

Is there a method that supports directly open a file URL like smart-open? https://pypi.org/project/smart-open/

open('s3://commoncrawl/robots.txt')

longern avatar Aug 05 '19 02:08 longern

I have such a need and so have some ready code for it.

chfw avatar Aug 05 '19 06:08 chfw

I have a set of similar use cases here:

https://github.com/moremoban/moban/blob/dev/moban/file_system.py

where you can find:

  1. read_text(a_fs_url)
  2. read_binary(a_fs_url)

And I would need similar functionality from os.path:

  1. os.path.exist -> the_file_system.path_exists(a_fs_url)
  2. os.path.isfile -> the_file_system.is_file(a_fs_url)
  3. os.path.isdir -> the_file_system.is_dir(a_fs_url) ....

But I thought it is only me who have such a need and I am not sure if such use cases fit in with pyfs2's concept: always open parent directory, then open a file.

chfw avatar Aug 05 '19 07:08 chfw

Not as such, but there is the open method which will split a path from the FS URL.

>>> from fs.opener import open
>>> zip_fs, path = open("zip://foo.zip!/bar/egg")
>>> zip_fs.readtext(path)

willmcgugan avatar Aug 05 '19 07:08 willmcgugan

However fs.opener.open won't work for nonexistent path.

longern avatar Aug 05 '19 08:08 longern

@longern What would you expect to happen for a nonexistent path?

willmcgugan avatar Aug 05 '19 09:08 willmcgugan

Some of the methods may accept a nonexistent path as the argument, such as mkdir, exists, and sometimes write to a new file. Is there any shortcut for them?

exists('s3://commoncrawl/robots.txt')
mkdir('ftp://some-url/some-path/dirname')

longern avatar Aug 05 '19 09:08 longern

I'm not sure I follow. Are you looking for something like this?

with open_fs("s3://commoncrawl") as fs:
    robots_exists = fs.exists("robots.txt")

willmcgugan avatar Aug 05 '19 09:08 willmcgugan

Sometimes file URL is from user input so I need to split fs URL and path for every operation. I'm looking for some methods to directly operate file URL.

longern avatar Aug 05 '19 09:08 longern

You can use this method to parse FS URLs.

willmcgugan avatar Aug 05 '19 10:08 willmcgugan

@willmcgugan The documentation for ParseResult mentions a path part, but https://pyfilesystem2.readthedocs.io/en/latest/openers.html doesn't document how to include the path in an FS URL.

lurch avatar Aug 12 '19 10:08 lurch

And it didn’t say how to open a file but a path.

chfw avatar Aug 12 '19 15:08 chfw

I can make my module as an independent lib if there are enough interests.

https://github.com/moremoban/moban/blob/dev/moban/file_system.py

chfw avatar Aug 24 '19 18:08 chfw

Or I can upstream it into PyFilesystem2 if it fits its mission.

chfw avatar Aug 24 '19 18:08 chfw

I was looking for this, but I foudn that fs.opener.open didn't work for a file in the current directory. It just keeps saying that the root path does not exist.

CMCDragonkai avatar Jan 24 '20 03:01 CMCDragonkai

Seems like we just have to use:

import os

(fspath, filename) = os.path.split('s3://commoncrawl/a/b/c/robots.txt')
# note that this keeps the query parameter in the filename

Not sure if query parameters matter here.

CMCDragonkai avatar Feb 06 '20 02:02 CMCDragonkai

The problem is some file system abstractions like s3 and gs use the first component of the URL as the bucket and don't expose it as part of the abstraction. It's an argument to the constructor, basically. You'd have to have file systems implement a classmethod to open an arbitrary URL to get around this.

dargueta avatar Feb 06 '20 18:02 dargueta

Example? Are you saying the s3 fs impl cannot open the path including the directory?

CMCDragonkai avatar Feb 07 '20 00:02 CMCDragonkai

Sorry that was a bad example

dargueta avatar Feb 10 '20 23:02 dargueta

I wrote something like this:

def parse_file_url(url: str) -> Tuple[str, str]:
    fs_url = ''
    file_path = ''
    url_parsed = urllib.parse.urlparse(url)
    # if there's no scheme, it's a filesystem path
    if not url_parsed.scheme:
        fs_url += 'osfs://'
        # if it is an absolute path, the fs_url must start at the root
        if url_parsed.path.startswith('/'):
            fs_url += '/'
        # remove any leading slashes
        file_path += url_parsed.path.lstrip('/')
        if url_parsed.params:
            file_path += f';{url_parsed.params}'
        if url_parsed.fragment:
            file_path += f'#{url_parsed.fragment}'
    else:
        if not url_parsed.path:
            fs_url += f'{url_parsed.scheme}://'
            if url_parsed.query:
                fs_url += f'?{url_parsed.query}'
            file_path += url_parsed.netloc
        else:
            fs_url += f'{url_parsed.scheme}://'
            if url_parsed.netloc:
                fs_url += url_parsed.netloc
            if url_parsed.query:
                fs_url += f'?{url_parsed.query}'
            file_path += url_parsed.path
            if url_parsed.params:
                file_path += f';{url_parsed.params}'
            if url_parsed.fragment:
                file_path += f'#{url_parsed.fragment}'
    return (fs_url, file_path)


@contextlib.contextmanager
def open_file_url(url: str,
                  mode: str = 'r',
                  buffering=-1,
                  encoding=None,
                  errors=None,
                  newline='') -> Iterator[IO]:
    (fs_url, file_path) = parse_file_url(url)
    with fs.open_fs(fs_url) as fs_:
        with fs_.open(file_path, mode, buffering, encoding, errors,
                      newline) as file:
            yield file

CMCDragonkai avatar Feb 13 '20 03:02 CMCDragonkai

I ended up with something like this:

@contextmanager
def open_file(url: str,
              mode: str = "r",
              create: bool = False,
              buffering: int = -1,
              encoding: Optional[str] = None,
              errors: Optional[str] = None,
              newline: str = "",
              **options) -> typing.IO:
    writeable = True if "w" in mode else False
    dir_url, file_name = os.path.split(url)
    with open_fs(dir_url, writeable, create) as fs_:
        with fs_.open(file_name, mode, buffering, encoding, errors, newline, **options) as file_:
            yield file_

mezhaka avatar Oct 28 '20 10:10 mezhaka