python-snappy Decompress Chunk Truncated error

I have tried to follow the readme and write this line from my REPL like so:

python -m snappy -d temp.snappy temp.txt

However I get the error UncompressError: chunk truncated

Dec 04 '20 16:12 mayurpande

Also when I try to use it within my script it fails saying:

snappy.UncompressError: Error while decompressing: invalid input

However I have a bytes like object:

    with open('libs/temp.snappy', 'rb') as f:
        data = f.read()
        snappy.uncompress(data)

Dec 04 '20 16:12 mayurpande

how did you make the file? Note the difference between decompress and stream_decompress.

Dec 04 '20 16:12 martindurant

It is compressed in S3, then I download it.

Dec 04 '20 16:12 mayurpande

It is compressed in S3

How was is created?

Dec 04 '20 16:12 martindurant

It is compressed in S3

How was is created?

It is fed through Firehose and then the Firehose handles the compression.

Dec 04 '20 16:12 mayurpande

So, this library has three decompress functions, you should try each. Otherwise, you will need to get the detail of what firehose is doing for you. This isn't a parquet file, right?

Dec 04 '20 16:12 martindurant

So, this library has three decompress functions, you should try each. Otherwise, you will need to get the detail of what firehose is doing for you. This isn't a parquet file, right?

No, it isn't parquet. As per the information for Firehose it says

S3 compression and encryption Kinesis Data Firehose can compress records before delivering them to your S3 bucket. Compressed records can also be encrypted in the S3 bucket using a KMS master key.

So I have tried uncompress and stream_compress and both do not work, I will try decompress now.

Dec 04 '20 16:12 mayurpande

Kinesis Data Firehose can compress records before delivering them to your S3 bucket.

Sorry, that doesn't give us much to work from. Also, don't forget hadoop_stream_decompress.

Dec 04 '20 16:12 martindurant

Sorry, that doesn't give us much to work from. Also, don't forget hadoop_stream_decompress.

It just means that is compresses into a specified format, with the ability to choose from:

Disabled
GZIP
Snappy
Zip
Hadoop-Compatible Snappy

Dec 04 '20 16:12 mayurpande

Also, don't forget hadoop_stream_decompress.

    with open('libs/temp.snappy', 'rb') as f:
        data = f.read()
        decom = snappy.hadoop_snappy.StreamDecompressor()
        un = decom.decompress(data)

This is the only thing that didn't throw an error, however it returns an empty bytes string. But when I use the mac snzip command line tool it uncompresses the file.

Dec 04 '20 17:12 mayurpande

I have tried to follow the readme and write this line from my REPL like so:
python -m snappy -d temp.snappy temp.txt
However I get the error UncompressError: chunk truncated

Also there is still the issue of this. Not too sure why all methods are not working, but as have mentioned I am able to use snzip from my command line.

Dec 04 '20 17:12 mayurpande

So I have tried uncompress and stream_compress

You meant stream_decompress ? Sounds like that should be the one, guessing from the snzip readme.

Dec 04 '20 17:12 martindurant

You meant stream_decompress ?

Yes, it still doesn't work, throws me

snappy.UncompressError: stream missing snappy identifier I tried to do what they done here but still did not work.

Sounds like that should be the one, guessing from the snzip readme.

I am just going to use a subprocess call to snzip as that works. If there is a fix to this, please let me know.

Dec 04 '20 17:12 mayurpande

Sorry, I don't have any more suggestions for you. Perhaps someone else does.

Dec 04 '20 17:12 martindurant

Sorry, I don't have any more suggestions for you. Perhaps someone else does.

No problem, thank you for the guidance anyways. My colleague was having the same issue with chunk truncated as well.

Dec 04 '20 18:12 mayurpande

Anyone with any ideas for this?

Dec 17 '20 11:12 mayurpande

I have this issue as well. Any feedback is appreciated.

Jun 08 '21 03:06 randomtask2000

You might want to try the package cramjam, which has cramjam.snappy.decompress and cramjam.snappy.decompress_raw for the framed and unframed formats, respectively. I don't believe it has a CLI, but you could request one.

Jun 08 '21 13:06 martindurant

@mayurpande

Sorry to bother you, but I ave the same problem trying to read Kinesis Firehose snappy file. Did you find a way to uncompress it?

Regards

Aug 23 '22 09:08 omendoza-itera

I don't believe it has a CLI, but you could request one.

Just released one, pip install cramjam-cli for anyone interested.

May 06 '23 14:05 milesgranger

python-snappy python-snappy copied to clipboard

Decompress Chunk Truncated error

python-snappy
python-snappy copied to clipboard