CRC check failed when reading after seeking
System Information:
- Ubuntu 20.10
- Python 3.8.10
- rarfile 4.0
- unar v1.10.1
- UNRAR 5.61 beta 1
Both, unar and unrar are in my PATH, so I don't know which is used. I think I don't have bsdtar installed.
Steps to reproduce:
- Create test rar:
echo foo > bar && rar a bar.rar bar - Open with rarfile and seek and read:
import rarfile
rar = rarfile.RarFile("bar.rar")
file = rar.open("bar")
# These read calls were only to show that rarfile generally works but it seems they are somewhat important for reproduction!
file.read(1) # b'f'
file.read(1) # b'o'
file.read(1) # b'o'
file.read(1) # b'\n'
file.read(1) # b''
# Seeking to 0 is no problem. Again, these calls can be omitted for reproducing the problem
file.seek(0)
file.read() # b'foo\n'
# Here begins the problematic sequence
file.seek(1) # 1
file.read()
---------------------------------------------------------------------------
BadRarFile Traceback (most recent call last)
<ipython-input-42-f3fc120c03c1> in <module>
----> 1 file.read()
~/.local/lib/python3.8/site-packages/rarfile.py in read(self, n)
2200 if not data or self._remain == 0:
2201 # self.close()
-> 2202 self._check()
2203 return data
2204
~/.local/lib/python3.8/site-packages/rarfile.py in _check(self)
2216 raise BadRarFile("Failed the read enough data")
2217 if final != exp:
-> 2218 raise BadRarFile("Corrupt file - CRC check failed: %s - exp=%r got=%r" % (
2219 self._inf.filename, exp, final))
2220
BadRarFile: Corrupt file - CRC check failed: bar - exp=2117232040 got=3195718521
Forward seeking does not seem to be a problem. This works:
rar = rarfile.RarFile("bar.rar")
file = rar.open("bar")
file.seek(1)
file.read()
However, as soon as I am seeking backwards, the problem arises even when using crc_check=False, which makes it even weirder!
rar = rarfile.RarFile("bar.rar", crc_check=False)
file = rar.open("bar")
file.read(2)
file.seek(1)
file.read() # exception!
I took a quick look at the source and documentation and it seems that backward seeking is supposed to be implemented by reopening the file. Somehow that reopen isn't effective enough. My workaround, which also simply reopens the file, works without problems:
class RawFileInsideRar(io.RawIOBase):
def __init__(self, reopen, file_size):
self.reopen = reopen
self.fileobj = reopen()
self.file_size = file_size
def __enter__(self):
return self
def __exit__(self, exception_type, exception_value, exception_traceback):
self.close()
def close(self) -> None:
self.fileobj.close()
def fileno(self) -> int:
# This is a virtual Python level file object and therefore does not have a valid OS file descriptor!
raise io.UnsupportedOperation()
def seekable(self) -> bool:
return self.fileobj.seekable()
def readable(self) -> bool:
return self.fileobj.readable()
def writable(self) -> bool:
return False
def read(self, size: int = -1) -> bytes:
return self.fileobj.read(size)
def seek(self, offset: int, whence: int = io.SEEK_SET) -> int:
if whence == io.SEEK_CUR:
offset += self.tell()
elif whence == io.SEEK_END:
offset += self.file_size
if offset >= self.tell():
return self.fileobj.seek(offset, io.SEEK_SET)
self.fileobj = self.reopen()
return self.fileobj.seek(offset, io.SEEK_SET)
def tell(self) -> int:
return self.fileobj.tell()
Replacing the rar.open("bar") in my minimal non-working examples the following two lines will make them run just fine:
info = rar.getinfo("bar")
file = RawFileInsideRar(lambda: rar.open(info), info.file_size)
Thanks for the report!