lz4-java icon indicating copy to clipboard operation
lz4-java copied to clipboard

Support for dependent blocks in decompression

Open cnuernber opened this issue 3 years ago • 3 comments

Reading an apache arrow file we got:

Dependent block stream is unsupported (BLOCK_INDEPENDENCE must be set).

Is there any interest in supporting this feature? Our system decompresses columns in parallel so block level parallelism in decompression isn't necessary so my thought is to simply concatenate all blocks and decompress them in one shot.

cnuernber avatar Mar 08 '22 23:03 cnuernber

The work around for this is to use zstd - unfortunately lz4 is the default format for many of these pathways.

cnuernber avatar Mar 09 '22 14:03 cnuernber

The go code manually resizes the dictionary - https://github.com/pierrec/lz4/blob/v4/reader.go#L180.

The java code completely hides the dictionary leading to it being - I think - impossible to do with via simple updates to frameinputstream.

@jpountz - Is it a viable pathway to do a simple update to the java bindings in order to support dependent frames? Another pathway would be to just call the C library directly via FFI bindings.

cnuernber avatar Mar 09 '22 14:03 cnuernber

I was able to (hopefully temporarily) work around this using ffi bindings to the c library. Unfortunately this means users need to ensure liblz4 is available on their system.

cnuernber avatar Mar 09 '22 20:03 cnuernber