rust-brotli
rust-brotli copied to clipboard
Feature: Add option to force compressed data to be byte aligned
This adds byte_align and bare_stream parameters to enable production of Brotli compressed blocks that can be used to construct a complete stream with trivial byte-wise concatonation.
To my knowledge, this functionality does not currently exist in any other Brotli implementation.
These options are exposed via the command line tool as --bytealign and --bare (bare mode enables byte alignment).
Byte align mode inserts an empty metadata block before the final empty block if the compressed data block is not already byte aligned.
Bare mode additionally omits the final empty block, and, in catable mode, the stream header.
TODO
- [x] don't emit empty penultimate metadata block if the compressed block ended on a byte boundry
- [x] add support for emitting "bare" compressed data without a stream header or empty last block
- [x] don't bother with byte aligning the start in appendable mode
- [x] plumb for use by libraries
- [ ] update documentation
- [ ] add tests
Notes to maintainers
This is the first time I've ever touched Rust code, so please forgive me if I've done something silly.
@philippeitis @danielrh can i get some feedback on this please?
I'm not actually a maintainer, but my feedback is largely nit-picking code style. However, given that the codebase itself was machine-generated, I don't think that's a major issue. Otherwise, I think providing a link in the PR to relevant documentation of this feature in other brotli implementations (if it exists, otherwise saying that it's original would also be fine) would be helpful. Again, not a maintainer, so take my comments with a grain of salt.
Thanks for following up. I was basing my additions on the existing codebase, I didn't realize it was machine generated. I think it'd be mildly confusing to mix use of e.g. (*s).field and s.field, but not my repo, and I'm find doing it however will get merged.
To the best of my knowledge, this feature doesn't exist in any other Brotli implementation, and I can edit the PR description to reflect that a bit later. I can expand a bit on the motivation as well.
Hi! Thanks for doing this--sorry I've been away for a bit but I'm back now...I'd love to hear about the motivation of this patch and what problem it's trying to solve...is it taking the catable command one step further to allow vanilla cat instead of the special broccoli cat tools to combine files?
@danielrh Yes, the idea is to produce substreams which can be assembled with the generic cat command or other tools that do naive concatenation/copying of bytes. Simply copying bytes ought to be faster than having to do even minor bit shuffling, though I feel like simplicity is a bigger win here.
This could be used in a number of scenarios, but the two main example I had in mind:
- Stitching together multiple blocks of precompressed HTML, JavaScript and/or CSS into a stream
- Limited templating with precompressed data, e.g. a precompressed JavaScript IIFE with an object literal embedded at the very end as a data parameter
What do we want to do about the the issue of changes matching existing code conventions vs changes being reasonably idiomatic rust code? My sensibilities swing towards keeping code style consistent, but it's your call.
Ok this is really cool! We have a scenario that is similar to https://dropbox.tech/infrastructure/-broccoli--syncing-faster-by-syncing-less where we our customers upload to a content-addressable-store.
Create a header block:
touch empty.bin
--appendable --bytealign --bare -c empty.bin start.br
Individually compress the actual data blocks like this:
--catable --bytealign --bare -c block001 block001.br
Then fake a ISLASTEMPTY metablock:
printf "\x03" > ~/end.br
Then you can both
- Decompress individual blocks by a) prepending the start block and b) appending the end block:
cat start.br block001.br end.br | brotli -d - A standard decompressor (e.g.
curl --compressed) can recreate the whole file from the concatenation of all the compressed blocks: e.g.cat start.br block*.br end.br | brotli -d - If you need to, rearrange the compressed blocks order to rearrange the output order
@ryancdotorg Have you done anything more with this idea?
I brought it over to https://github.com/johnterickson/BrotliSharpLib/tree/bytealign
@johnterickson No, haven't been working on it. All this PR needs is a rebase, tests and updated docs...