bao
bao copied to clipboard
Idea: deflate compatible file format
One thing that limits the applicability of streaming verification is the use a a non-standard format. I believe it could help adoption if the same unmodified file or URL could be used for both verified and unverified streaming.
This can be done be masquerading as a deflated stream.
The stream would consist of alternating blocks of uncompressed stored data (00) and "compressed" (10) blocks that contain only a fake huffman table with the verification hashes but is not followed by any actual data encoded using this table.This interleaved stream could be processed by any standard deflate decoder into the original data. The block sizes used can be fixed and deterministic, supporting random access to any position.
When served by an HTTP server the stream can use the deflate or gzip Content-Encoding: to be transparently processed by any standard naive client. The end user will not even see the wrapper file or need to manually gunzip it. An aware client (e.g. an extension to curl) can use the extra data to verify the stream as it is downloaded.
Have you looked at the --outboard
command line flag? I think it achieves what you want here in a simpler way. The downside is that you need two streams, so it doesn't work well in shell pipelines, but my hope is that that's not a problem in regular code.
https://github.com/oconnor663/bao#outboard-mode
I am aware that a separate verification metadata stream can be used. But this is not the default - and for a good reason.
What I am suggesting is a tweak to the “in board” interleaved format that makes it possible to decode (without verification) using a ubiquitous tool.
I don't think it would make sense to support this as a first class feature of Bao, but it wouldn't be too hard to write a separate utility that converted in between the standard inline format and this deflate-based version. The main thing you need is a function to tell you how many parents nodes come before each chunk, which already exists in the Bao code. We could make it public, or you could just copy/paste it. (It's a few lines of arithmetic, and easy to test, though kind of tricky to get right the first time.)
One concern I'd have is that, since Bao is a security tool, we want to be really careful about any encoding that different clients might interpret in different ways. I'm not familiar enough with the deflate format to know what to look for here.