filesystem_spec
filesystem_spec copied to clipboard
fsspec.fuse with zstd file crashes when trying to read from a file
I have a script called fsspec
:
#!/usr/bin/env python3
import sys
if '-f' in sys.argv:
del sys.argv[sys.argv.index("-f")]
from fsspec.implementations.tar import TarFileSystem as tafs
fs = tafs(sys.argv[1])
print(f"Mount {sys.argv[1]} at {sys.argv[2]}")
import fsspec.fuse
fsspec.fuse.run(fs, "./", sys.argv[2])
I have created test files and test them like this:
echo bar > foo
tar -cf foo.tar ./foo
mkdir mounted
for c in bzip2 gzip zstd xz; do
$c -f -k foo.tar
done
for c in bz2 gz xz zst; do
echo "== Testing with foo.tar.$c =="
sleep 1s
./fsspec foo.tar.$c mounted &
sleep 0.5s
cat mounted/foo
fusermount -u mounted
done
Output:
== Testing with foo.tar.bz2 ==
bar
== Testing with foo.tar.gz ==
bar
== Testing with foo.tar.xz ==
bar
== Testing with foo.tar.zst ==
Uncaught critical exception from FUSE operation read, aborting.
Traceback (most recent call last):
File "/home/user/.local/lib/python3.10/site-packages/fuse.py", line 734, in _wrapper
return func(*args, **kwargs) or 0
File "/home/user/.local/lib/python3.10/site-packages/fuse.py", line 844, in read
ret = self.operations('read', self._decode_optional_path(path), size,
File "/home/user/.local/lib/python3.10/site-packages/fuse.py", line 1075, in __call__
return getattr(self, op)(*args)
File "/home/user/.local/lib/python3.10/site-packages/fsspec/fuse.py", line 78, in read
f.seek(offset)
io.UnsupportedOperation: File or stream is not seekable.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/user/.local/lib/python3.10/site-packages/fuse.py", line 737, in _wrapper
if e.errno > 0:
TypeError: '>' not supported between instances of 'NoneType' and 'int'
cat: mounted/foo: Software caused connection abort
cat: mounted/foo: Transport endpoint is not connected
For some reason, the file object created with the FUSEr.open
call does not seem to be seekable when used in FUSEr.read
.
Note that the seek in this case isn't even necessary because it tries to seek to offset 0, where it already is. So adding if f.tell() != offset:
check before the seek should fix it for my case, but then I get a similar error from another point:
Uncaught critical exception from FUSE operation read, aborting.
Traceback (most recent call last):
File "/home/user/.local/lib/python3.10/site-packages/fuse.py", line 734, in _wrapper
return func(*args, **kwargs) or 0
File "/home/user/.local/lib/python3.10/site-packages/fuse.py", line 844, in read
ret = self.operations('read', self._decode_optional_path(path), size,
File "/home/user/.local/lib/python3.10/site-packages/fuse.py", line 1075, in __call__
return getattr(self, op)(*args)
File "/home/user/.local/lib/python3.10/site-packages/fsspec/fuse.py", line 81, in read
out = f.read(size)
File "/usr/lib/python3.10/tarfile.py", line 700, in readinto
buf = self.read(len(b))
File "/usr/lib/python3.10/tarfile.py", line 688, in read
self.fileobj.seek(offset + (self.position - start))
OSError: cannot seek zstd decompression stream backwards
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/user/.local/lib/python3.10/site-packages/fuse.py", line 737, in _wrapper
if e.errno > 0:
TypeError: '>' not supported between instances of 'NoneType' and 'int'
Note that the other exception from site-packges/fuse.py
is unrelated. It is what I get after monkey-patching a long-standing bug.
I formulated this issue as related to fsspec.fuse
, but it seems to me that this is a more general problem and would also appear when ussing fsspec as a library.
I'm surprised that gzip and bzip2 do work because they should have had the exact same issue with non-seekability.
I am using Python 3.10.12.
@mxmlnkn Mind submitting the monkeypatch as a PR?
I think there has been some confusion. The patch is for fusepy, not fsspec. A corresponding open PR already exists here.