python-snappy
python-snappy copied to clipboard
Decompress Chunk Truncated error
I have tried to follow the readme and write this line from my REPL like so:
python -m snappy -d temp.snappy temp.txt
However I get the error UncompressError: chunk truncated
Also when I try to use it within my script it fails saying:
snappy.UncompressError: Error while decompressing: invalid input
However I have a bytes like object:
with open('libs/temp.snappy', 'rb') as f:
data = f.read()
snappy.uncompress(data)
how did you make the file?
Note the difference between decompress
and stream_decompress
.
It is compressed in S3, then I download it.
It is compressed in S3
How was is created?
It is compressed in S3
How was is created?
It is fed through Firehose and then the Firehose handles the compression.
So, this library has three decompress functions, you should try each. Otherwise, you will need to get the detail of what firehose is doing for you. This isn't a parquet file, right?
So, this library has three decompress functions, you should try each. Otherwise, you will need to get the detail of what firehose is doing for you. This isn't a parquet file, right?
No, it isn't parquet. As per the information for Firehose it says
S3 compression and encryption Kinesis Data Firehose can compress records before delivering them to your S3 bucket. Compressed records can also be encrypted in the S3 bucket using a KMS master key.
So I have tried uncompress
and stream_compress
and both do not work, I will try decompress
now.
Kinesis Data Firehose can compress records before delivering them to your S3 bucket.
Sorry, that doesn't give us much to work from. Also, don't forget hadoop_stream_decompress
.
Sorry, that doesn't give us much to work from. Also, don't forget
hadoop_stream_decompress
.
It just means that is compresses into a specified format, with the ability to choose from:
- Disabled
- GZIP
- Snappy
- Zip
- Hadoop-Compatible Snappy
Also, don't forget
hadoop_stream_decompress
.
with open('libs/temp.snappy', 'rb') as f:
data = f.read()
decom = snappy.hadoop_snappy.StreamDecompressor()
un = decom.decompress(data)
This is the only thing that didn't throw an error, however it returns an empty bytes string. But when I use the mac snzip
command line tool it uncompresses the file.
I have tried to follow the readme and write this line from my REPL like so:
python -m snappy -d temp.snappy temp.txt
However I get the error
UncompressError: chunk truncated
Also there is still the issue of this. Not too sure why all methods are not working, but as have mentioned I am able to use snzip
from my command line.
So I have tried uncompress and stream_compress
You meant stream_decompress ? Sounds like that should be the one, guessing from the snzip readme.
You meant stream_decompress ?
Yes, it still doesn't work, throws me
snappy.UncompressError: stream missing snappy identifier
I tried to do what they done here but still did not work.
Sounds like that should be the one, guessing from the snzip readme.
I am just going to use a subprocess
call to snzip
as that works. If there is a fix to this, please let me know.
Sorry, I don't have any more suggestions for you. Perhaps someone else does.
Sorry, I don't have any more suggestions for you. Perhaps someone else does.
No problem, thank you for the guidance anyways. My colleague was having the same issue with chunk truncated
as well.
Anyone with any ideas for this?
I have this issue as well. Any feedback is appreciated.
You might want to try the package cramjam
, which has cramjam.snappy.decompress
and cramjam.snappy.decompress_raw
for the framed and unframed formats, respectively. I don't believe it has a CLI, but you could request one.
@mayurpande
Sorry to bother you, but I ave the same problem trying to read Kinesis Firehose snappy file. Did you find a way to uncompress it?
Regards
I don't believe it has a CLI, but you could request one.
Just released one, pip install cramjam-cli
for anyone interested.