rust-brotli icon indicating copy to clipboard operation
rust-brotli copied to clipboard

Feature: Add option to force compressed data to be byte aligned

Open ryancdotorg opened this issue 5 years ago • 12 comments

This adds byte_align and bare_stream parameters to enable production of Brotli compressed blocks that can be used to construct a complete stream with trivial byte-wise concatonation.

To my knowledge, this functionality does not currently exist in any other Brotli implementation.

These options are exposed via the command line tool as --bytealign and --bare (bare mode enables byte alignment).

Byte align mode inserts an empty metadata block before the final empty block if the compressed data block is not already byte aligned.

Bare mode additionally omits the final empty block, and, in catable mode, the stream header.

TODO

  • [x] don't emit empty penultimate metadata block if the compressed block ended on a byte boundry
  • [x] add support for emitting "bare" compressed data without a stream header or empty last block
  • [x] don't bother with byte aligning the start in appendable mode
  • [x] plumb for use by libraries
  • [ ] update documentation
  • [ ] add tests

Notes to maintainers

This is the first time I've ever touched Rust code, so please forgive me if I've done something silly.

ryancdotorg avatar Nov 22 '20 14:11 ryancdotorg

@philippeitis @danielrh can i get some feedback on this please?

ryancdotorg avatar Dec 15 '20 12:12 ryancdotorg

I'm not actually a maintainer, but my feedback is largely nit-picking code style. However, given that the codebase itself was machine-generated, I don't think that's a major issue. Otherwise, I think providing a link in the PR to relevant documentation of this feature in other brotli implementations (if it exists, otherwise saying that it's original would also be fine) would be helpful. Again, not a maintainer, so take my comments with a grain of salt.

philippeitis avatar Dec 16 '20 22:12 philippeitis

Thanks for following up. I was basing my additions on the existing codebase, I didn't realize it was machine generated. I think it'd be mildly confusing to mix use of e.g. (*s).field and s.field, but not my repo, and I'm find doing it however will get merged.

To the best of my knowledge, this feature doesn't exist in any other Brotli implementation, and I can edit the PR description to reflect that a bit later. I can expand a bit on the motivation as well.

ryancdotorg avatar Dec 16 '20 22:12 ryancdotorg

Hi! Thanks for doing this--sorry I've been away for a bit but I'm back now...I'd love to hear about the motivation of this patch and what problem it's trying to solve...is it taking the catable command one step further to allow vanilla cat instead of the special broccoli cat tools to combine files?

danielrh avatar Jul 31 '21 20:07 danielrh

@danielrh Yes, the idea is to produce substreams which can be assembled with the generic cat command or other tools that do naive concatenation/copying of bytes. Simply copying bytes ought to be faster than having to do even minor bit shuffling, though I feel like simplicity is a bigger win here.

This could be used in a number of scenarios, but the two main example I had in mind:

  • Stitching together multiple blocks of precompressed HTML, JavaScript and/or CSS into a stream
  • Limited templating with precompressed data, e.g. a precompressed JavaScript IIFE with an object literal embedded at the very end as a data parameter

ryancdotorg avatar Aug 02 '21 03:08 ryancdotorg

What do we want to do about the the issue of changes matching existing code conventions vs changes being reasonably idiomatic rust code? My sensibilities swing towards keeping code style consistent, but it's your call.

ryancdotorg avatar Oct 18 '21 04:10 ryancdotorg

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Apr 16 '22 21:04 CLAassistant

Ok this is really cool! We have a scenario that is similar to https://dropbox.tech/infrastructure/-broccoli--syncing-faster-by-syncing-less where we our customers upload to a content-addressable-store.

Create a header block: touch empty.bin --appendable --bytealign --bare -c empty.bin start.br Individually compress the actual data blocks like this: --catable --bytealign --bare -c block001 block001.br Then fake a ISLASTEMPTY metablock: printf "\x03" > ~/end.br

Then you can both

  1. Decompress individual blocks by a) prepending the start block and b) appending the end block: cat start.br block001.br end.br | brotli -d
  2. A standard decompressor (e.g. curl --compressed) can recreate the whole file from the concatenation of all the compressed blocks: e.g. cat start.br block*.br end.br | brotli -d
  3. If you need to, rearrange the compressed blocks order to rearrange the output order

johnterickson avatar Nov 09 '23 20:11 johnterickson

@ryancdotorg Have you done anything more with this idea?

I brought it over to https://github.com/johnterickson/BrotliSharpLib/tree/bytealign

johnterickson avatar Nov 12 '23 15:11 johnterickson

@johnterickson No, haven't been working on it. All this PR needs is a rebase, tests and updated docs...

ryancdotorg avatar Nov 12 '23 19:11 ryancdotorg