py7zr
py7zr copied to clipboard
readall() raises Bad7zFile: CRC32 error
Describe the bug readall() raises Bad7zFile: CRC32 error
To Reproduce Steps to reproduce the behavior:
- download and unzip tests.zip
- cd to the tests directory
- run pip install py7zr
- run:
python python_7z.py ok.7z
script should run fine and list the content found in the provided nested archive: - run
python python_7z.py ko.7z
script fails instead of listing content:
$ python python_7z.py ko.7z
Traceback (most recent call last):
File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\py7zr\py7zr.py", line 604, in _extract
self.worker.extract(
File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\py7zr\py7zr.py", line 1183, in extract
self.extract_single(
File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\py7zr\py7zr.py", line 1276, in extract_single
raise e
File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\py7zr\py7zr.py", line 1264, in extract_single
raise CrcError("{}".format(f.filename))
py7zr.exceptions.CrcError: EventConsumer.txt
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "python_7z.py", line 19, in <module>
list_archive(top_archive_name, top_archive)
File "python_7z.py", line 11, in list_archive
for filename, file_content in archive.readall().items():
File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\py7zr\py7zr.py", line 940, in readall
return self._extract(path=None, return_dict=True)
File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\py7zr\py7zr.py", line 609, in _extract
raise Bad7zFile("CRC32 error on archived file {}.".format(str(ce)))
py7zr.exceptions.Bad7zFile: CRC32 error on archived file EventConsumer.txt.
Expected behavior List (nested) archive(s) content recursively.
Environment (please complete the following information):
- OS: Windows 10
- Python 3.8.8,
- py7zr version: 0.16.1, installed by pip
Test data(please attach in the report): ok.7z and ko.7z are attached within the zip file, along the python script (7z not allowed on github uploads).
Additional context Both sample archives extract fine from 7z-FM 19.00 (x86).
ko.7z has following header properties;
emptyfiles = [True, True, True]
files = [
{'emptystream': False, 'filename': 'Config.xml', 'lastwritetime': ArchiveTimestamp(132755837132590000), 'attributes': 128},
{'emptystream': True, 'filename': 'EventConsumer.log', 'lastwritetime': ArchiveTimestamp(132755837132590000), 'attributes': 128},
{'emptystream': False, 'filename': 'EventConsumer.txt', 'lastwritetime': ArchiveTimestamp(132755837132590000), 'attributes': 128},
{'emptystream': False, 'filename': 'processes1.csv', 'lastwritetime': ArchiveTimestamp(132755837132590000), 'attributes': 128},
{'emptystream': True, 'filename': 'processes1.log', 'lastwritetime': ArchiveTimestamp(132755837132590000), 'attributes': 128},
{'emptystream': False, 'filename': 'processes2.csv', 'lastwritetime': ArchiveTimestamp(132755837132590000), 'attributes': 128},
{'emptystream': True, 'filename': 'processes2.log', 'lastwritetime': ArchiveTimestamp(132755837132590000), 'attributes': 128}
]
packsizes = [3100, 1, 21160, 1, 18794, 1]
unpacksizes = [28360, 0, 1068, 136588, 0, 124443, 0]
num_unpackstream_folders = [1, 0, 2, 0, 1, 0]
digests = [3531454146, 1149430100, 529556584, 1488982218]
When looking into 'EventConsumer.log' property, it is weird...
A following values means the file has a stream of 1 byte but extracted file size is zero
packsize = 1
unpacksize = 0
num_unpackstream_folders = 0
but definition say, there is no stream of packed data.
emptystream = True
These are contradicted.
The file entries by 7z command is
Blocks = 6
Date Time Attr Size Compressed Name
------------------- ----- ------------ ------------ ------------------------
2021-09-08 23:08:33 ..... 28360 3100 Config.xml
2021-09-08 23:08:33 ..... 0 0 EventConsumer.log
2021-09-08 23:08:33 ..... 1068 21160 EventConsumer.txt
2021-09-08 23:08:33 ..... 136588 processes1.csv
2021-09-08 23:08:33 ..... 0 0 processes1.log
2021-09-08 23:08:33 ..... 124443 18794 processes2.csv
2021-09-08 23:08:33 ..... 0 0 processes2.log
------------------- ----- ------------ ------------ ------------------------
2021-09-08 23:08:33 290459 43054 7 files
@DoNck How do you made data you produce the issue? It is the actually bug both producer and extractor.
Hi @miurahr, thank you for your quick support. This archive is produced by this tool. I don't know if you tried, but I forgot to mention: Both provided archives extract fine from 7z CLI and GUI tools. Could py7zr lib offer the same support despite something being wrong in the archive producer side ?
Hi all ! The CRCError is raised on the EventConsumer.txt file (non-empty), not EventConsumer.log (empty). I don't know if this changes anything, but I was just checking we were on the same page.
ko.7z has following header properties;
emptyfiles = [True, True, True] files = [ {'emptystream': False, 'filename': 'Config.xml', 'lastwritetime': ArchiveTimestamp(132755837132590000), 'attributes': 128}, {'emptystream': True, 'filename': 'EventConsumer.log', 'lastwritetime': ArchiveTimestamp(132755837132590000), 'attributes': 128}, {'emptystream': False, 'filename': 'EventConsumer.txt', 'lastwritetime': ArchiveTimestamp(132755837132590000), 'attributes': 128}, {'emptystream': False, 'filename': 'processes1.csv', 'lastwritetime': ArchiveTimestamp(132755837132590000), 'attributes': 128}, {'emptystream': True, 'filename': 'processes1.log', 'lastwritetime': ArchiveTimestamp(132755837132590000), 'attributes': 128}, {'emptystream': False, 'filename': 'processes2.csv', 'lastwritetime': ArchiveTimestamp(132755837132590000), 'attributes': 128}, {'emptystream': True, 'filename': 'processes2.log', 'lastwritetime': ArchiveTimestamp(132755837132590000), 'attributes': 128} ] packsizes = [3100, 1, 21160, 1, 18794, 1] unpacksizes = [28360, 0, 1068, 136588, 0, 124443, 0] num_unpackstream_folders = [1, 0, 2, 0, 1, 0] digests = [3531454146, 1149430100, 529556584, 1488982218]
From the above output, the CRC32 are valid for all the non-empty files, but they are missing for empty files (4 digests for 7 seven files) When I run the code with the exception raising patched to print some info instead (in https://github.com/miurahr/py7zr/blob/f579fc67ea77c12052083034254a3ced341e5412/py7zr/py7zr.py#L1264), this is what I get:
CRCError ! EventConsumer.txt: expected=1149430100, got=0
CRCError ! processes1.csv: expected=529556584, got=1149430100
CRCError ! processes2.csv: expected=1488982218, got=529556584
...
The CRC32 are still valid but offsetted.
We will be investigating our usage of the 7z library in https://github.com/DFIR-ORC/dfir-orc/issues/49, but maybe there is something we are (both) missing in the handling of empty files and their CRC32 ? What do you think @miurahr ?
Hi, are there any update regarding this situation ?
Best regards
This is very corner case and difficult to analyze. It is still in investigation.
@DoNck If you have any findings, comments are welcome!
Hi, We fixed (DFIR-ORC/dfir-orc@7d8bf430cb8cc22216d1f788ef41a3a42fbf0d97) the handling of empty streams added to an archive to match what is done for empty files. However, 7z does not really specify that it should be handled one way or the other, so it might be interesting for your implementation in py7zr to handle both cases as the 7z CLI seems to do. Regards
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days