webdav4 when file is large, seek is very slow

when file is large, seek is very slow

Open observerss opened this issue 2 years ago • 1 comments

In stream.py, seek function is

    def seek(self, offset: int, whence: int = 0) -> int:  # noqa: C901
        """Seek the file object."""
        if whence == 0:
            loc = offset
        elif whence == 1:
            if offset >= 0:
                self.read(offset)
                return self.loc
            loc = self.loc + offset
        elif whence == 2:
            if not self.size:
                raise ValueError("cannot seek to the end of file")
            loc = self.size + offset
        else:
            raise ValueError(f"invalid whence ({whence}, should be 0, 1 or 2)")
        if loc < 0:
            raise ValueError("Seek before start of file")
        if loc and not self.supports_ranges:
            raise ValueError("server does not support ranges")

        self.close()
        self._cm = iter_url(self.client, self.url, pos=loc, chunk_size=self.chunk_size)
        #  pylint: disable=no-member
        _, self._iterator = self._cm.__enter__()
        self.loc = loc
        return loc

when whence == 1 and offset > 0, the seek will read to the offset

            if offset >= 0:
                self.read(offset)
                return self.loc
            loc = self.loc + offset

to seek 1G later will read 1G content first, which is very inefficient If I comment out the if statement, the seek operation works too, it will create a new iterator, use Range header to fast locate the position

Mar 29 '23 06:03 observerss

I think it was added assuming that on SEEK_CUR, the offsets are small, and might be already cached in our buffer and that I wanted to reset the iterator as much as possible (not all webdav servers support ranges).

Feel free to propose a PR. 🙂

Mar 29 '23 07:03 skshetry

webdav4 webdav4 copied to clipboard

when file is large, seek is very slow

webdav4
webdav4 copied to clipboard