py7zr icon indicating copy to clipboard operation
py7zr copied to clipboard

readall() raises Bad7zFile: CRC32 error

Open DoNck opened this issue 3 years ago • 13 comments

Describe the bug readall() raises Bad7zFile: CRC32 error

To Reproduce Steps to reproduce the behavior:

  1. download and unzip tests.zip
  2. cd to the tests directory
  3. run pip install py7zr
  4. run: python python_7z.py ok.7z script should run fine and list the content found in the provided nested archive:
  5. run python python_7z.py ko.7z script fails instead of listing content:
$ python python_7z.py ko.7z
Traceback (most recent call last):
  File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\py7zr\py7zr.py", line 604, in _extract
    self.worker.extract(
  File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\py7zr\py7zr.py", line 1183, in extract
    self.extract_single(
  File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\py7zr\py7zr.py", line 1276, in extract_single
    raise e
  File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\py7zr\py7zr.py", line 1264, in extract_single
    raise CrcError("{}".format(f.filename))
py7zr.exceptions.CrcError: EventConsumer.txt

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "python_7z.py", line 19, in <module>
    list_archive(top_archive_name, top_archive)
  File "python_7z.py", line 11, in list_archive
    for filename, file_content in archive.readall().items():
  File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\py7zr\py7zr.py", line 940, in readall
    return self._extract(path=None, return_dict=True)
  File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\py7zr\py7zr.py", line 609, in _extract
    raise Bad7zFile("CRC32 error on archived file {}.".format(str(ce)))
py7zr.exceptions.Bad7zFile: CRC32 error on archived file EventConsumer.txt.

Expected behavior List (nested) archive(s) content recursively.

Environment (please complete the following information):

  • OS: Windows 10
  • Python 3.8.8,
  • py7zr version: 0.16.1, installed by pip

Test data(please attach in the report): ok.7z and ko.7z are attached within the zip file, along the python script (7z not allowed on github uploads).

Additional context Both sample archives extract fine from 7z-FM 19.00 (x86).

DoNck avatar Sep 14 '21 07:09 DoNck

ko.7z has following header properties;

emptyfiles  = [True, True, True]
files = [
  {'emptystream': False, 'filename': 'Config.xml', 'lastwritetime': ArchiveTimestamp(132755837132590000), 'attributes': 128},
  {'emptystream': True, 'filename': 'EventConsumer.log', 'lastwritetime': ArchiveTimestamp(132755837132590000), 'attributes': 128},
  {'emptystream': False, 'filename': 'EventConsumer.txt', 'lastwritetime': ArchiveTimestamp(132755837132590000), 'attributes': 128}, 
  {'emptystream': False, 'filename': 'processes1.csv', 'lastwritetime': ArchiveTimestamp(132755837132590000), 'attributes': 128},
  {'emptystream': True, 'filename': 'processes1.log', 'lastwritetime': ArchiveTimestamp(132755837132590000), 'attributes': 128},
  {'emptystream': False, 'filename': 'processes2.csv', 'lastwritetime': ArchiveTimestamp(132755837132590000), 'attributes': 128},
  {'emptystream': True, 'filename': 'processes2.log', 'lastwritetime': ArchiveTimestamp(132755837132590000), 'attributes': 128}
]
packsizes = [3100, 1, 21160, 1, 18794, 1]
unpacksizes = [28360, 0, 1068, 136588, 0, 124443, 0]
num_unpackstream_folders = [1, 0, 2, 0, 1, 0]
digests = [3531454146, 1149430100, 529556584, 1488982218]

miurahr avatar Sep 17 '21 00:09 miurahr

When looking into 'EventConsumer.log' property, it is weird...

A following values means the file has a stream of 1 byte but extracted file size is zero

packsize = 1
unpacksize = 0
num_unpackstream_folders = 0

but definition say, there is no stream of packed data.

emptystream = True

These are contradicted.

miurahr avatar Sep 17 '21 00:09 miurahr

The file entries by 7z command is

Blocks = 6

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2021-09-08 23:08:33 .....        28360         3100  Config.xml
2021-09-08 23:08:33 .....            0            0  EventConsumer.log
2021-09-08 23:08:33 .....         1068        21160  EventConsumer.txt
2021-09-08 23:08:33 .....       136588               processes1.csv
2021-09-08 23:08:33 .....            0            0  processes1.log
2021-09-08 23:08:33 .....       124443        18794  processes2.csv
2021-09-08 23:08:33 .....            0            0  processes2.log
------------------- ----- ------------ ------------  ------------------------
2021-09-08 23:08:33             290459        43054  7 files

miurahr avatar Sep 17 '21 00:09 miurahr

@DoNck How do you made data you produce the issue? It is the actually bug both producer and extractor.

miurahr avatar Sep 19 '21 12:09 miurahr

Hi @miurahr, thank you for your quick support. This archive is produced by this tool. I don't know if you tried, but I forgot to mention: Both provided archives extract fine from 7z CLI and GUI tools. Could py7zr lib offer the same support despite something being wrong in the archive producer side ?

DoNck avatar Sep 20 '21 08:09 DoNck

Hi all ! The CRCError is raised on the EventConsumer.txt file (non-empty), not EventConsumer.log (empty). I don't know if this changes anything, but I was just checking we were on the same page.

ko.7z has following header properties;

emptyfiles  = [True, True, True]
files = [
  {'emptystream': False, 'filename': 'Config.xml', 'lastwritetime': ArchiveTimestamp(132755837132590000), 'attributes': 128},
  {'emptystream': True, 'filename': 'EventConsumer.log', 'lastwritetime': ArchiveTimestamp(132755837132590000), 'attributes': 128},
  {'emptystream': False, 'filename': 'EventConsumer.txt', 'lastwritetime': ArchiveTimestamp(132755837132590000), 'attributes': 128}, 
  {'emptystream': False, 'filename': 'processes1.csv', 'lastwritetime': ArchiveTimestamp(132755837132590000), 'attributes': 128},
  {'emptystream': True, 'filename': 'processes1.log', 'lastwritetime': ArchiveTimestamp(132755837132590000), 'attributes': 128},
  {'emptystream': False, 'filename': 'processes2.csv', 'lastwritetime': ArchiveTimestamp(132755837132590000), 'attributes': 128},
  {'emptystream': True, 'filename': 'processes2.log', 'lastwritetime': ArchiveTimestamp(132755837132590000), 'attributes': 128}
]
packsizes = [3100, 1, 21160, 1, 18794, 1]
unpacksizes = [28360, 0, 1068, 136588, 0, 124443, 0]
num_unpackstream_folders = [1, 0, 2, 0, 1, 0]
digests = [3531454146, 1149430100, 529556584, 1488982218]

From the above output, the CRC32 are valid for all the non-empty files, but they are missing for empty files (4 digests for 7 seven files) When I run the code with the exception raising patched to print some info instead (in https://github.com/miurahr/py7zr/blob/f579fc67ea77c12052083034254a3ced341e5412/py7zr/py7zr.py#L1264), this is what I get:

CRCError ! EventConsumer.txt: expected=1149430100, got=0
CRCError ! processes1.csv: expected=529556584, got=1149430100
CRCError ! processes2.csv: expected=1488982218, got=529556584
...

The CRC32 are still valid but offsetted.

We will be investigating our usage of the 7z library in https://github.com/DFIR-ORC/dfir-orc/issues/49, but maybe there is something we are (both) missing in the handling of empty files and their CRC32 ? What do you think @miurahr ?

sc-anssi avatar Sep 20 '21 14:09 sc-anssi

Hi, are there any update regarding this situation ?

Best regards

DoNck avatar Oct 05 '21 08:10 DoNck

This is very corner case and difficult to analyze. It is still in investigation.

@DoNck If you have any findings, comments are welcome!

miurahr avatar Oct 07 '21 12:10 miurahr

Hi, We fixed (DFIR-ORC/dfir-orc@7d8bf430cb8cc22216d1f788ef41a3a42fbf0d97) the handling of empty streams added to an archive to match what is done for empty files. However, 7z does not really specify that it should be handled one way or the other, so it might be interesting for your implementation in py7zr to handle both cases as the 7z CLI seems to do. Regards

sc-anssi avatar Dec 13 '21 14:12 sc-anssi

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days

github-actions[bot] avatar Mar 14 '22 00:03 github-actions[bot]

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days

github-actions[bot] avatar Jun 14 '22 00:06 github-actions[bot]

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days

github-actions[bot] avatar Sep 13 '22 00:09 github-actions[bot]

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days

github-actions[bot] avatar Dec 15 '22 00:12 github-actions[bot]