hyperdrive icon indicating copy to clipboard operation
hyperdrive copied to clipboard

Version Feed Proposal

Open pfrazee opened this issue 8 years ago • 7 comments

This proposal relates to the open PR, #102. It's mostly @mafintosh's idea so if you don't like it you can blame him.

Requirements

  • Version Listing. It's useful to quickly find the checkpoint history of a dat.
  • Historic Lookups. We need to be able to quickly lookup a file at a given checkpoint label (eg 1.0.0).

Issues

Writing checkpoint messages into the metadata log means you need to scan the full log to find a version. This makes Version Listing and Historic Lookup slow.

If we can quickly list versions, then we can use the snapshot pointers in the version entries to also quickly lookup the files in the versions.

Proposal

In the first message of a live hypercore, the content feed is announced. This proposal will update that message to also announce a version feed. The new Index schema:

message Index {
  optional bytes content = 1;
  optional bytes versions = 2;
}

If a metadata feed does not announce a versions feed in the initial Index message, then it is an "unversioned dat." An unversioned data still has the implicit versioning of snapshot shas, but won't be able to label snapshots with checkpoint messages.

The version feed is comprised of checkpoint messages. The checkpoint messages include the following data:

  • name. The name of the checkpoint. Must be unique to the feed.
  • timestamp. The walltime of the author when the checkpoint is created.
  • description. Optional string describing the checkpoint.
  • content. The SHA of the content feed at the time of the checkpoint.
  • metadata. The SHA of the metadata feed at the time of the checkpoint.
  • mindex. The sequence number of the latest metadata-feed message at the time of the checkpoint.

This proposal is 100% backwards compatible.

pfrazee avatar Dec 27 '16 21:12 pfrazee

cc @maxogden @substack

pfrazee avatar Dec 27 '16 21:12 pfrazee

so tl;dr is that this is a 3rd optional feed that we can read into memory to resolve labels to hashes or list history? if there are e.g. 1 million entries in this feed (e.g. a million labels), how fast could we look up an arbitrary version?

max-mapper avatar Dec 30 '16 21:12 max-mapper

Oh good question. If we assume monotonicity on the labels (semvers) could we use a B-Tree? @mafintosh

pfrazee avatar Dec 30 '16 21:12 pfrazee

So, @mafintosh and I were discussing the lookup optimization he's adding rn† and @mafintosh realized that we could just as easily put version/checkpoint information in a file within the dat, rather than in a separate feed. That'd be simpler to implement and would avoid protocol changes.

I played with this idea 6 months ago with https://github.com/pfrazee/bdat-versions-file. We could create a new file like I did there, or use dat.json.

I'm on board with this. Thoughts? cc @maxogden @joehand @karissa @yoshuawuyts @substack

†adds efficient over-the-network lookup for a specific file

pfrazee avatar Jan 11 '17 20:01 pfrazee

@pfrazee it sounds very straight forward, I quite like it :sparkles:

yoshuawuyts avatar Jan 12 '17 02:01 yoshuawuyts

yeah, this sounds right! Also makes it easy to integrate/read/parse for ui

okdistribute avatar Jan 12 '17 02:01 okdistribute

every entry is a version now. we might want to add explicit version snapshots in the future tho so will keep this open

mafintosh avatar Apr 09 '17 08:04 mafintosh