python-snappy
python-snappy copied to clipboard
How to handle snappy files generated by Trino?
Hello,
With the new release to 0.7.1 the I can't decompress CSV files generated by Trino, I think the issue is related with the Hadoop_snappy. Does anyone know how it can fixed?
from snappy import snappy_formats
csv_file = 'csv_67dba65a.snappy'
def read_file(file_path):
return open(file_path, 'rb')
decompress_func, read_chunk = snappy_formats.get_decompress_function(
'auto',
read_file(csv_file)
)
decompressed_stream = io.BytesIO()
# Decompress the data
decompress_func(
read_file(csv_file),
decompressed_stream,
start_chunk=read_chunk
)
decompressed_stream.seek(0)
print(f"Compressed file: {read_file(csv_file).read()}")
print(f"DeCompressed file: {decompressed_stream.read()}")
This code has different outputs based on the version:
-
0.7.0
Compressed file: b'\x00\x00\x00\x04\x00\x00\x00\x06\x04\x0c"a"\n'DeCompressed file: b'"a"\n"a"\n' -
0.7.1
.venv/lib/python3.12/site-packages/snappy/snappy_formats.py", line 64, in get_decompress_function
decompress_func, read_chunk = guess_format_by_header(fin)
.venv/lib/python3.12/site-packages/snappy/snappy_formats.py", line 59, in guess_format_by_header
raise UncompressError("Can't detect archive format")
snappy.snappy.UncompressError: Can't detect archive format
Since you are the second to ask, we might be able to re-implement this, at least compress/decompress (as opposed to streaming)