python-snappy icon indicating copy to clipboard operation
python-snappy copied to clipboard

Decompress Chunk Truncated error

Open mayurpande opened this issue 4 years ago • 20 comments

I have tried to follow the readme and write this line from my REPL like so:

python -m snappy -d temp.snappy temp.txt

However I get the error UncompressError: chunk truncated

mayurpande avatar Dec 04 '20 16:12 mayurpande

Also when I try to use it within my script it fails saying:

snappy.UncompressError: Error while decompressing: invalid input

However I have a bytes like object:

    with open('libs/temp.snappy', 'rb') as f:
        data = f.read()
        snappy.uncompress(data)

mayurpande avatar Dec 04 '20 16:12 mayurpande

how did you make the file? Note the difference between decompress and stream_decompress.

martindurant avatar Dec 04 '20 16:12 martindurant

It is compressed in S3, then I download it.

mayurpande avatar Dec 04 '20 16:12 mayurpande

It is compressed in S3

How was is created?

martindurant avatar Dec 04 '20 16:12 martindurant

It is compressed in S3

How was is created?

It is fed through Firehose and then the Firehose handles the compression.

mayurpande avatar Dec 04 '20 16:12 mayurpande

So, this library has three decompress functions, you should try each. Otherwise, you will need to get the detail of what firehose is doing for you. This isn't a parquet file, right?

martindurant avatar Dec 04 '20 16:12 martindurant

So, this library has three decompress functions, you should try each. Otherwise, you will need to get the detail of what firehose is doing for you. This isn't a parquet file, right?

No, it isn't parquet. As per the information for Firehose it says

S3 compression and encryption Kinesis Data Firehose can compress records before delivering them to your S3 bucket. Compressed records can also be encrypted in the S3 bucket using a KMS master key.

So I have tried uncompress and stream_compress and both do not work, I will try decompress now.

mayurpande avatar Dec 04 '20 16:12 mayurpande

Kinesis Data Firehose can compress records before delivering them to your S3 bucket.

Sorry, that doesn't give us much to work from. Also, don't forget hadoop_stream_decompress.

martindurant avatar Dec 04 '20 16:12 martindurant

Sorry, that doesn't give us much to work from. Also, don't forget hadoop_stream_decompress.

It just means that is compresses into a specified format, with the ability to choose from:

  • Disabled
  • GZIP
  • Snappy
  • Zip
  • Hadoop-Compatible Snappy

mayurpande avatar Dec 04 '20 16:12 mayurpande

Also, don't forget hadoop_stream_decompress.

    with open('libs/temp.snappy', 'rb') as f:
        data = f.read()
        decom = snappy.hadoop_snappy.StreamDecompressor()
        un = decom.decompress(data)

This is the only thing that didn't throw an error, however it returns an empty bytes string. But when I use the mac snzip command line tool it uncompresses the file.

mayurpande avatar Dec 04 '20 17:12 mayurpande

I have tried to follow the readme and write this line from my REPL like so:

python -m snappy -d temp.snappy temp.txt

However I get the error UncompressError: chunk truncated

Also there is still the issue of this. Not too sure why all methods are not working, but as have mentioned I am able to use snzip from my command line.

mayurpande avatar Dec 04 '20 17:12 mayurpande

So I have tried uncompress and stream_compress

You meant stream_decompress ? Sounds like that should be the one, guessing from the snzip readme.

martindurant avatar Dec 04 '20 17:12 martindurant

You meant stream_decompress ?

Yes, it still doesn't work, throws me

snappy.UncompressError: stream missing snappy identifier I tried to do what they done here but still did not work.

Sounds like that should be the one, guessing from the snzip readme.

I am just going to use a subprocess call to snzip as that works. If there is a fix to this, please let me know.

mayurpande avatar Dec 04 '20 17:12 mayurpande

Sorry, I don't have any more suggestions for you. Perhaps someone else does.

martindurant avatar Dec 04 '20 17:12 martindurant

Sorry, I don't have any more suggestions for you. Perhaps someone else does.

No problem, thank you for the guidance anyways. My colleague was having the same issue with chunk truncated as well.

mayurpande avatar Dec 04 '20 18:12 mayurpande

Anyone with any ideas for this?

mayurpande avatar Dec 17 '20 11:12 mayurpande

I have this issue as well. Any feedback is appreciated.

randomtask2000 avatar Jun 08 '21 03:06 randomtask2000

You might want to try the package cramjam, which has cramjam.snappy.decompress and cramjam.snappy.decompress_raw for the framed and unframed formats, respectively. I don't believe it has a CLI, but you could request one.

martindurant avatar Jun 08 '21 13:06 martindurant

@mayurpande

Sorry to bother you, but I ave the same problem trying to read Kinesis Firehose snappy file. Did you find a way to uncompress it?

Regards

omendoza-itera avatar Aug 23 '22 09:08 omendoza-itera

I don't believe it has a CLI, but you could request one.

Just released one, pip install cramjam-cli for anyone interested.

milesgranger avatar May 06 '23 14:05 milesgranger