DEPs icon indicating copy to clipboard operation
DEPs copied to clipboard

Proposal: Dat mounts / symlinks (revisited)

Open pfrazee opened this issue 6 years ago • 36 comments

This idea isn't new, but I've recently realized there's a potential optimization that might make this worth prioritizing.

Let's call this proposal a "mount." It's conceptually simple, like a symlink but for dats. It could apply to both hyperdb and to hyperdrive. It is a pointer which maps a prefix/folder to a prefix/folder of another hyperdb or hyperdrive.

var a = hyperdrive(...)
var b = hyperdrive(...)
a.mount('/foo', b, {path: '/bar'}, (err) => {
  // now: a /foo -> b /bar
  a.readdir('/foo', (err, fooListing) => {
    b.readdir('/bar', (err, barListing) => {
      // fooListing === barListing
    })
  })
})

When a hyperdb/drive is replicated, the client would request any mounted dbs/drives using the same connection & swarm as the parent db/drive. This means that a mount does not have to incur additional swarming overhead. (This is the optimization I was referring to.)

Mounting is generally useful to applications. It has the following uses:

  1. Mapping upstream dependencies. For instance, a /vendor directory could be populated with mounts to libraries.
  2. Collaboration. This does not enable users to modify the same data (that's multi-writer) but it does make it possible to mount "user-owned" directories which are easy to discover and leverage. For instance, a /users directory could be populated with mounts to a site's active users; /users/bob could point to bob user.
  3. Data-clustering. Because mounted dats can be shared over the parent's swarm, the number of total active swarms can be reduced.

pfrazee avatar Jun 21 '18 16:06 pfrazee

I really like this!

Some thoughts:

  • If it's replicating through the existing feed, how will it get updates to the feed from the owner if they're not part of the same discovery swarm?
  • When you download() a directory containing mounts, will this download any mounts that they have as well (recursion)?
  • Any concerns with cyclic mounts?
  • Will archive.stat have a new field signifying that this is a mount? Or will this use the linkname property somehow?

RangerMauve avatar Jun 21 '18 17:06 RangerMauve

If it's replicating through the existing feed, how will it get updates to the feed from the owner if they're not part of the same discovery swarm?

Yeah good catch, I've been thinking about that but don't have an answer yet. I think it's pretty important that we cluster the swarms to make this perform the way it needs to, but I'm not sure how you avoid a segmented network (so to speak).

When you download() a directory containing mounts, will this download any mounts that they have as well (recursion)?

That's a good question. I'd be inclined to say any recursive behavior should either not recurse into mounts by default but have flags to do so, or never recurse into mounts ever.

Any concerns with cyclic mounts?

I just dealt with this for symlinks in the local filesystem; it's actually not too difficult to detect and abort recursion if this occurs. (You just have to maintain a set of folders you visit, which means there's a memory cost.)

Will archive.stat have a new field signifying that this is a mount? Or will this use the linkname property somehow?

I do think we'd need some kind of field on the stat output.

pfrazee avatar Jun 21 '18 18:06 pfrazee

I think it's pretty important that we cluster the swarms to make this perform the way it needs to, but I'm not sure how you avoid a segmented network (so to speak).

Maybe when you commit to "seeding" a dat, you join the networks for all the mounts?

I think this shows that there should be a way to notify the application of all mounted dats. Either something like archive.mounts((err, {mountPath, path, key}) => {}) or passing onNewMount((mountPath, path, key}) => {}) as a listener.

RangerMauve avatar Jun 21 '18 18:06 RangerMauve

Maybe when you commit to "seeding" a dat, you join the networks for all the mounts?

That's a thought. I'm sure we'll figure something out.

I think this shows that there should be a way to notify the application of all mounted dats. Either something like archive.mounts((err, {mountPath, path, key}) => {}) or passing onNewMount((mountPath, path, key}) => {}) as a listener.

Probably, yeah, because the replication code would need it.

pfrazee avatar Jun 21 '18 18:06 pfrazee

One other thought: it might be useful to be able to pin the version of the upstream.

pfrazee avatar Jun 21 '18 18:06 pfrazee

pin the version of the upstream

What about using a URL for mounting instead of a separate key / path? i.e. .mount("/foo", "dat://keyhere/bar")

RangerMauve avatar Jun 21 '18 18:06 RangerMauve

@RangerMauve That's really just a question of API. The URL is a serialization of the same info. In the node hyperdrive module you need to pass in the hyperdrive instance of the mount, because the hyperdrive module has no way to load other hyperdrives automatically. In Beaker, that's not the case.

pfrazee avatar Jun 21 '18 18:06 pfrazee

That's a good point.

you need to pass in the hyperdrive instance of the mount

Should the API look something like a.mount('/foo', b, {path: '/bar', version: 420}, etc)? Instead of passing in b.key, it should be b and the hyperdrive can get the key itself since it needs to have access to the files in the drive.

RangerMauve avatar Jun 21 '18 18:06 RangerMauve

Oh, yes, that was a typo. I edited the example in the original.

pfrazee avatar Jun 21 '18 18:06 pfrazee

I was thinking about this too lately and quite like your proposal. One point might need some more thought: If implemented at the hyperdb level (which opens many doors, so ++) then we might also deal with different value encodings of different dbs (ie: What happens if, in your example, hyperdb a is a hyperdrive, but b has JSON values?) So either we would somehow need to deal with it, add a check to see if the two dbs have the same value encodings, or leave it up the applications. I think though that if somehow dealt with in a good way it could be a great feature, possibly orthogonal to subhyperdb.

I'm thinking of a usecase I have, and of which I presume it will become not quite uncommon: Combining a hyperdrive with another data structure. E.g. a hyperdrive with metadata in a hypergraph or forum in DatDB with JSON-encoded values, with the feature to browse uploaded files via regular hyperdrive-based dat tools (e.g. beaker's filesystem view or the dat cli), or some yet-upcoming dat-based file-based version control system with embedded issue management.

I think this proposal could allow for that. Or is this out of scope? Don't want to overtake this. I am in the process of doing a writeup/proposal to integrate some support for subhyperdb in the surrounding dat tooling (which is the other way I am exploring).

Frando avatar Jun 22 '18 15:06 Frando

@Frando yeah I think that's a fair point, we should give that some thought. Hypercores are having data structure identifiers added to them (DEP on the way) so it should be possible to semi-automatically resolve those lookups to the correct managing code.

pfrazee avatar Jun 22 '18 15:06 pfrazee

An appealing possible feature of this would be that multi-writer permissions could be scoped to prefixes, without needing to make the multi-writer implementation more complex than it already is. Eg, you could have different writers for different directories. In the context of browsers and executable code, this might have nice security implications (though I haven't thought this all through).

This could be used to "hide" hypercore feeds from metadata analysis; one discovery key would be used to find peers, then you would try to add the "actual" feed a as a second channel.

Could wrapper feeds be used to solve the breakage (corruption) and upgrade concerns raised in Issue 11? ('Proposal & Discussion: Hypercore "major version" pointer')

In the hyperdrive case, I think this could be prototyped at the application layer using pointers in dat.json, without requiring any protocol changes. A wrapper library would handle "virtual filesystem" lookups (matching prefixes to a specific hyperdb instance), and would attempt adding additional hypercore feeds to existing connections.

My gut instinct is always to keep the protocol and protobuf schemas as simple as possible, so i'd prefer to see this implemented at the application layer.

I also think there is a "semantic burden" of adding complexity to the ecosystem, which makes it harder for users and developers to reason about how dat works. I think the various git sub-repository systems all suffer from this: all applications and user experiences built on top of git have to handle the "special case" of sub-modules; CI automation, reproducible builds, branch management, tarball generation, etc, etc. I'm skeptical that the complexity with git was due to there being competing implementations (I think there is an inherent increase in complexity; consider that in addition to the submodule commit hashes, the number of submodules and where they point to can change at any time, making it hard to reason about what a given repository actually is or contains), but I could be wrong. The power and simplicity of the UNIX virtual file system sets a strong precedent that this pattern would be worth the complexity though.

bnewbold avatar Jun 27 '18 06:06 bnewbold

An appealing possible feature of this would be that multi-writer permissions could be scoped to prefixes, without needing to make the multi-writer implementation more complex than it already is.

Yes. Elaborating on this for posterity: current discussion around multi-writer has been looking at dat-wide permission policies. Mounts would make it possible to create prefix-scoped permission schemes by using multiple dats while still using dat-wide permission policies.

This could be used to "hide" hypercore feeds from metadata analysis; one discovery key would be used to find peers, then you would try to add the "actual" feed a as a second channel.

That occurred to me RE the discussion around reader policy. I'm not sure I'd state it as a goal, however, because of the other discussion around possibly needing to swarm the archives individually at times.

Could wrapper feeds be used to solve the breakage (corruption) and upgrade concerns raised in Issue 11? ('Proposal & Discussion: Hypercore "major version" pointer')

IE by mounting to root? Probably not. If the wrapper feed gets corrupted, then there'd be no wrapper to fix that, whereas the version pointer in the other proposal has semantics which can not be corrupted. (You can always fix the situation by publishing a pointer with a higher version... for some value of "always." Probably need to limit the number of major versions allowed before we end up with 1024bit majors.)

In the hyperdrive case, I think this could be prototyped at the application layer using pointers in dat.json, without requiring any protocol changes. A wrapper library would handle "virtual filesystem" lookups (matching prefixes to a specific hyperdb instance), and would attempt adding additional hypercore feeds to existing connections.

I'd be okay with this, but we'll also need a way to talk with the replication layer so that it can exchange the additional cores.

I also think there is a "semantic burden" of adding complexity to the ecosystem, which makes it harder for users and developers to reason about how dat works. ... The power and simplicity of the UNIX virtual file system sets a strong precedent that this pattern would be worth the complexity though.

I had a similar concern and that's why I've been a "no" on this for a while. The thing that ultimately swayed me was

  1. realizing we can merge the swarms of mounted archives, leading to a performance improvement that I think could be fairly significant;
  2. thinking more about how multi-user collaboration should work, and realizing that segmented namespaces are always going to be a key portion of any data model;
  3. seeing the broad applicability for vendoring upstream dependencies, espcially with version-pinning.

pfrazee avatar Jun 27 '18 16:06 pfrazee

Love this idea

mafintosh avatar Jun 27 '18 17:06 mafintosh

A spitball thought: if we had an excellent fuse (userspace filesystem mounting) implementation, would we even need this? or, staying in some non-operating-system API, should we explode the generality here and allow mounting many different types of things to the same namespace? eg, git repositories, hyperdrives, tar archives, HTTP remote directories, local filesystem folders. similar to the random access storage API.

And a design reference: would this address the design needs that sciencefair has? cc: @blahah for feedback. IIRC they wanted to have a stable root hypercore to reference, but be able to swap out sub-hyperdrives (so they could refactor content schema without being burdened with full storage history data size?), and work around performance/scaling issues with many files in non-hyperdb hyperdrives.

bnewbold avatar Jun 27 '18 18:06 bnewbold

I appreciate the spitballing, but I'm a hard no on that idea. Adding other protocol mounts is a ton of added complexity we dont need to consider right now. Each one of those alternative mount targets is a protocol that a dat-based platform needs to support, and each one is going to have its own characteristics that would require additional consideration. (For instance, a git repo has a ton of semantics around version control, while HTTP is a remote target.)

The door isn't closed to it (I plan to use URLs to reference the mounts in dat.json in my proposed implementation) but I just don't think that's a productive space to consider at this point.

pfrazee avatar Jun 27 '18 18:06 pfrazee

Another thought: re-using the existing connection for additional hypercore feeds (instead of doing a separate swarm) could result in discovery and history problems in some cases. Eg:

  1. hyperdrive ABC created with sub-module XYZ mounted at /vendor. Only 1-2 peers swarm.
  2. /vendor is replaced with sub-module LMN.
  3. many peers swarm, but don't pull full history (only hypercore feeds ABC and LMN)
  4. an observer interested in full history will have trouble finding a peer for ABC

This is sort of an existing issue with hypercore (how to find peers with full history, vs. just recent history?), and not all applications really care about having full history. A solution would be to have nodes join the swarm for sub-feeds, even if they are mostly using the efficiency trick Paul mentions (opening new channels on existing connection, instead of discovering peers the regular way); this would ensure that the historical sub-modules are discover-able. Maybe a user-agent configuration option if there are discovery scaling issues ("the number of total active swarms can be reduced"); this may require real-world experimentation and testing to sort out? User agents should also notice if they aren't able to find sub-feeds after several attempts of opening channels with existing peers and go through the full discovery process (aka, I think that mechanism is a great optimization, but there should be an elegant fallback).

Platforms like hashbase.io will need to support automatically fetching sub-feeds; should they tie these together, or auto-create a new hypercore "item"/path? Needs design thought but solvable. Such platforms would be natural places to store full history (and broadcast as such for discovery).

@pfrazee could you clarify the "number of total active swarms can be reduced" motivation? Is this to reduce the amount of open TCP/uTP connections beaker has to keep up with (causing battery/wifi waste), or a performance issue with many swarms, or load on discovery servers/networks? I think we should try to minimize all of these, but i'm curious which if any are the most acute today.

bnewbold avatar Jun 27 '18 19:06 bnewbold

@bnewbold you just articulated everything I've been thinking, including the potential approach of having the swarm-manager decide to join the swarm on not-found, or just prioritizing the subfeeds lower once it has to start juggling open connections.

@pfrazee could you clarify the "number of total active swarms can be reduced" motivation?

There's really two things that interest me:

  1. Reducing the number of overall connections
  2. Improving the latency of fetching new data for a given archive (no need to discovery peers for each feed individually)

Right now, point 2 is the most high value, and we actually get that no matter what. Point 1 becomes important once we start scaling.

pfrazee avatar Jun 27 '18 19:06 pfrazee

Will we get change events for mounted dats through the DatArchive API?

RangerMauve avatar Jun 30 '18 20:06 RangerMauve

We'll have to see what the performance looks like if we bubble the events.

pfrazee avatar Jul 02 '18 14:07 pfrazee

Glad this idea got picked up again.

I had a naive implementation of symlink on top of hyperdrive. It worked. But in order to re-use a swarm to sync all linked archive, the API becomes really awkward.

It would be nice to have this built-in to hypercore/hyperdrive, so we can have a cleaner API.

poga avatar Jul 02 '18 23:07 poga

So, this would be at the hyperdrive level, not hyperdb, right?

RangerMauve avatar Jul 10 '18 21:07 RangerMauve

So, this would be at the hyperdrive level, not hyperdb, right?

It appears that's the plan for now.

pfrazee avatar Jul 10 '18 23:07 pfrazee

I think this could play in well with content addresability.

I looked at the hyperdrive DEP a little and it seems to be pretty sparce at the moment. How would this relate to the Node type?

RangerMauve avatar Jul 11 '18 00:07 RangerMauve

@mafintosh has been working on "strong" links with https://github.com/mafintosh/hypercore-strong-identifier which might be what you're looking for

pfrazee avatar Jul 11 '18 00:07 pfrazee

I'm also a big :+1: on this. It'd be a very powerful feature, but it does add some non-obvious complexity.

For one, versioning dbs containing many non-versioned symlinks could be confusing. The naive approach here is to store all links in an index, such that you can iterate over them and grab their versions during every call to the parent's version method. Alternatively, one could say that if a link is not pinned to a specific version, then its version should not be included in the parent's.

The latter seems like the right place to start, but it might lead to confusion if that behavior if not obvious to a user (i.e. they might checkout the parent to a specific version, expecting a mounted library, say, to be at the correct version). While this obviates the need for the cross-db version computation, it's also limiting, as a user cannot have both a "live" link and the guarantee that checkouts will consistently work as expected (to do so requires checking out all links as well).

(P.S. Very much looking forward to being able to use this in upstream hyperdb, so I can get rid of the kludgy code in my approach here :wink:)

andrewosh avatar Jul 11 '18 04:07 andrewosh

@pfrazee, re: "strong links"

I'm not sure if those really address the need for content addressability since it's tied to the feed and to the representation of the merkle tree rather than to file contents.

RangerMauve avatar Jul 11 '18 14:07 RangerMauve

@RangerMauve depends on whether you're just looking for a guarantee about what data you receive, or if you're also trying to get deduplication

pfrazee avatar Jul 11 '18 16:07 pfrazee

@andrewosh that's a good observation about versioning. I suppose it's not too different than package.json. If you consistently put @latest on your deps, you'd get the same kind of issue (and of course nobody does that). I dont have a ton of experience with git submodules, but I think those are always pegged to a version and you have to manually update them.

pfrazee avatar Jul 11 '18 16:07 pfrazee

@pfrazee I definately want deduplication. I think it's one of the things that Dat is lacking at the moment compared to IPFS.

Although de-duplication between multiple Dats would require something fancy in the storage layer, I think that getting it within a hyperdrive is a great first step and will help with stuff like duplicate files and reverted changes. Plus, content addressability being in the hyperdrive will make it easier to make storage layers that do global de-duplication.

Also, content addresability will be useful for quick diffing which I don't think strong identifiers will be good for, unless I'm misunderstanding something, since the merkle tree will be different per fork.

RangerMauve avatar Jul 11 '18 16:07 RangerMauve