lzsa icon indicating copy to clipboard operation
lzsa copied to clipboard

Suggestion: Support for streamed decompression?

Open simondotm opened this issue 3 years ago • 5 comments

Hi there, thanks for this great project.

A while ago, I wrote a custom 8-bit oriented LZ4 compressor/decoder (but nowhere near as optimal as lzsa!), mainly to solve a particular problem of "streamed decompression", where we want to partially decompress data on the fly but without requiring full access to all of the previously decompressed data stream.

This is useful in 8-bit scenarios for example where we might be decompressing video or audio data to be consumed byte-by-byte through a small in-memory buffer, and it is not practical nor desirable to decompress the whole thing in one go due to memory or latency constraints.

In my custom modification to LZ4 I achieved this by just limiting the window size (similar to BLOCK_SIZE in lzsa I suspect) for match offsets, and setting it to some user provided command line value (in my use-case anywhere from 256 bytes to 2048 bytes).

In this way, we know the decoder will never need to persist more than WINDOW_SIZE previously decompressed bytes in memory, so all we need is a WINDOW_SIZE memory buffer on the decoder side, and some fairly trivial helper functions to supply decompressed bytes one at a time from the compressed data stream. (I just implemented a simple state machine in my 6502 decoder to keep a track of ongoing literal and match runs to facilitate fetching of individual bytes)

Naturally, setting a smaller window size for match offsets will degrade compression ratio, but we can happily accept that trade-off in exchange for the streamed decompression capability. I still achieved pretty decent ratios even with a tiny 256 byte window.

In summary, do you think the ability to specify the maximum match offset window size would be a feasible possibility for lzsa to support? Thanks!

simondotm avatar Jan 02 '21 12:01 simondotm

Hey,

Sorry for the delay in replying. Here is a patch to implement an optional window size setting: max_window.zip

Once you apply this and build the lzsa tool, you can use -w<max_value> to compress with a maximum offset value, ie. -w256 would never use offsets larger than 256 for instance. (entire parts of the decompressor may then be useless for that particular file)

Let me know if that's what works for you, and I am happy to merge this optional feature in

Thanks!

emmanuel-marty avatar Jan 10 '21 16:01 emmanuel-marty

Hi Emmanuel, That's fantastic, thanks. While I'm eager to test it, unfortunately I currently have no tooling to compile C/C++ code atm as I develop in python/node on Windows most of the time, so I'll see if I can find a way to build it but it may be a while before I can get back to you. Cheers

simondotm avatar Jan 10 '21 21:01 simondotm

Oh, no worries, I will build a modified exe for you today and you can let me know if the feature is what you need. Thanks for speaking up :)

emmanuel-marty avatar Jan 11 '21 10:01 emmanuel-marty

Here is a build with the -w option: lzsa_win64_1.3.6_maxwindow.zip

You can use eg. -w512 to use a max offset of 512 or whatever. All the other flags are as usual.

Let me know if that works for your needs and I'll be happy to merge the changes if so, if not, let me know and we can work on it further.

Obviously when you limit the offset like that, you can also envision commenting out parts of the depacker that are unused, if you will only use it with the max offset-limited data

Best regards, Emmanuel

emmanuel-marty avatar Jan 11 '21 11:01 emmanuel-marty

Thanks so much Emmanuel. I will give this a try with some test data sometime this week and let you know how I get on.

simondotm avatar Jan 11 '21 13:01 simondotm