indexed_bzip2 icon indicating copy to clipboard operation
indexed_bzip2 copied to clipboard

Don't simply quit on bad CRC?

Open mxmlnkn opened this issue 5 years ago • 5 comments

In order to read from faulty media it might be helpful to not just "crash" with an exception when a bad crc is encountered. I did simply quit because when a crc is wrong, I can't be sure anymore that the next bits are for the next block. However, for the parallel version I added a blockfinder function, which can search for the magic bit strings of bzip2 blocks. I could use that to recover from bad blocks. Ironically, this is not the only new feature which I could get out of the box from the parallelized design.

How to do error reporting then, a simple message to stderr?

mxmlnkn avatar Oct 04 '20 11:10 mxmlnkn

Sometimes I get an incomplete bz2 file and indexed_bzip2 hangs and my program can't continue. The standard bz2 package returns an error in this case

byphilipp avatar Jun 05 '25 13:06 byphilipp

It hangs? How exactly are you calling it? This probably is a slightly different problem, especially as the checksum is at the end, so it is not available when the archive is incomplete.

mxmlnkn avatar Jun 05 '25 13:06 mxmlnkn

Im call it from python

import indexed_bzip2
with indexed_bzip2.open( bzfile, parallelization=6 ) as source:
      memfile = source.read()

And program not response But the bz2 return a exception

import bz2
with open(bzfile, 'rb') as source:
    memfile = bz2.decompress(source.read())

byphilipp avatar Jun 05 '25 13:06 byphilipp

I cannot reproduce it. I tried with files generated like so:

base64 /dev/urandom | head -c 1024 | bzip2 | head -c 100 > base64.bz2.truncated
base64 /dev/urandom | head -c $(( 32 * 1024 * 1024 )) | bzip2 | head -c $(( 8 * 1024 * 1024 )) > base64.bz2.truncated

and with bzfile in your script being replaced with the path:

import indexed_bzip2
with indexed_bzip2.open( "base64.bz2.truncated", parallelization=6 ) as source:
      memfile = source.read()

For both, I get:

> python3 issue-7.py 
Traceback (most recent call last):
  File "issue-7.py", line 6, in <module>
    memfile = source.read()
              ^^^^^^^^^^^^^
  File "indexed_bzip2.pyx", line 251, in indexed_bzip2._IndexedBzip2FileParallel.readinto
RuntimeError: std::exception
> echo $?
1

I.e., it does not hang. Are you using your script in a pipe? Is your bzfile a file object or a path?

mxmlnkn avatar Jun 05 '25 14:06 mxmlnkn

Thank you so much My problem is related to inodes limitation in Ubuntu

byphilipp avatar Jun 05 '25 22:06 byphilipp