bamboo icon indicating copy to clipboard operation
bamboo copied to clipboard

Not hashing the signature

Open snej opened this issue 3 years ago • 3 comments

It occurred to me that, if an entry's signature were not included in the computation of its digest, then a peer could in most cases forget about the signature of an entry, saving 64 bytes.

Why? A connected set of entries is a Merkle tree, so a signature on the latest/root entry authenticates the whole tree. The signatures on the earlier entries are superfluous ... except inasmuch as they’re required in order to verify those entries' digests. If they weren’t, one could simply forget them and still have an authenticated set of entries.

The only negative effect is that the cert pool often becomes larger. If I ask a peer for entry N which is newer than any entry I have, and the peer doesn’t have N's signature anymore, it also has to send me a chain of entries leading down to N from the nearest entry whose signature it still knows.

Of course you can mitigate this by leaving the signatures on some intermediate nodes, like ones at higher levels in the lipmaa tree (13, 40, ...)

With this optimization, you can get rid of nearly all the entry metadata in local storage, because it can be recomputed when needed while syncing:

  • For entries that aren’t the latest, or roots of a disconnected sub graph, you can forget the signature.
  • Link hashes that point to other entries you store can be forgotten and recomputed when needed.
  • The payload hash/size can be omitted if you’re storing the payload.
  • And of course the author key and log ID can be factored out (e.g. replaced with a reference to a common database row that stores them.)

snej avatar Oct 11 '20 17:10 snej

Interesting idea, I will certainly put some thought into it the next time I revisit bamboo and related projects. It would make replication a bit more complicated, but probably nothing that could be handled. Although I cannot immediately tell how it would interact with replicating entries in reverse chronological order as allowed by https://github.com/AljoschaMeyer/bamboo-point2point

CC https://github.com/keks https://github.com/arj03

AljoschaMeyer avatar Oct 12 '20 14:10 AljoschaMeyer

To be a bit more explicit about the savings:

A complete log from sequences 1–N can be stored as nothing more than an array of N payloads, plus the signature and end_of_log flag of entry N, plus the author's public key and log_id. All the other metadata can be recomputed lazily. That means the minimal space overhead of Bamboo would be O(1), between 129-140 bytes.

If the log isn't complete, you need some extra metadata in order to reconstitute the entries you do have:

  • For any entry whose payload is missing, its payload size and hash need to be stored (~65-75 bytes.)
  • For any entry that's needed only to connect other entries, only its hash needs to be stored (64 bytes.)
  • For any entry that's entirely missing, but which is directly linked to by existing entries, its hash needs to be stored (64 bytes.) This is just to store the backlink and lipmaa_link properties of those entries.

So you end up storing just the payloads you're interested in, some payload hashes for entries your peers might be interested in, and a skeletal tree of 64-byte hashes to connect all of this together and to entry 1. Plus that small per-log metadata. That's really little overhead.

If you push a partial log to a peer, you have to do some work recomputing hashes. Probably not too many, though, if you're sending mostly a connected subtree. And of course you can cache unnecessary hashes to save time then.

snej avatar Oct 12 '20 18:10 snej

And just for completeness: by cashing the hashes of the entries on the "spine" (those of sequence numbers of the form ((3^k) - 1) / 2), one adds O(log(n)) space overhead but reduces the time for recomputing any uncached hash to O(log(n)).

The space savings for databases which store more than just a single branch of a long seem to be similar to those you listed for a single-branch database.

AljoschaMeyer avatar Oct 13 '20 09:10 AljoschaMeyer