bao icon indicating copy to clipboard operation
bao copied to clipboard

Idea: deflate compatible file format

Open orent opened this issue 2 years ago • 4 comments

One thing that limits the applicability of streaming verification is the use a a non-standard format. I believe it could help adoption if the same unmodified file or URL could be used for both verified and unverified streaming.

This can be done be masquerading as a deflated stream.

The stream would consist of alternating blocks of uncompressed stored data (00) and "compressed" (10) blocks that contain only a fake huffman table with the verification hashes but is not followed by any actual data encoded using this table.This interleaved stream could be processed by any standard deflate decoder into the original data. The block sizes used can be fixed and deterministic, supporting random access to any position.

When served by an HTTP server the stream can use the deflate or gzip Content-Encoding: to be transparently processed by any standard naive client. The end user will not even see the wrapper file or need to manually gunzip it. An aware client (e.g. an extension to curl) can use the extra data to verify the stream as it is downloaded.

orent avatar May 08 '22 10:05 orent

Have you looked at the --outboard command line flag? I think it achieves what you want here in a simpler way. The downside is that you need two streams, so it doesn't work well in shell pipelines, but my hope is that that's not a problem in regular code.

oconnor663 avatar May 08 '22 17:05 oconnor663

https://github.com/oconnor663/bao#outboard-mode

oconnor663 avatar May 08 '22 17:05 oconnor663

I am aware that a separate verification metadata stream can be used. But this is not the default - and for a good reason.

What I am suggesting is a tweak to the “in board” interleaved format that makes it possible to decode (without verification) using a ubiquitous tool.

orent avatar May 08 '22 20:05 orent

I don't think it would make sense to support this as a first class feature of Bao, but it wouldn't be too hard to write a separate utility that converted in between the standard inline format and this deflate-based version. The main thing you need is a function to tell you how many parents nodes come before each chunk, which already exists in the Bao code. We could make it public, or you could just copy/paste it. (It's a few lines of arithmetic, and easy to test, though kind of tricky to get right the first time.)

One concern I'd have is that, since Bao is a security tool, we want to be really careful about any encoding that different clients might interpret in different ways. I'm not familiar enough with the deflate format to know what to look for here.

oconnor663 avatar May 08 '22 21:05 oconnor663