DEPs icon indicating copy to clipboard operation
DEPs copied to clipboard

Half-Baked: Peer-to-Peer tunneling (and/or RPC)

Open bnewbold opened this issue 7 years ago • 5 comments

I think there are two important low-level features we might need to deliver to enable a broader set of applications and use cases: app-independent identity ("personas"), and robust, secure, authenticated peer-to-peer data channels. Both are Hard Problems. A lot of work has been put in to the former, and I think there are feasible solutions using some combination of public key crypto, web of trust, trust-on-first-use, device key signing, etc. Maybe some decentralized Keybase-like thing. For the later, XMPP, Matrix, and Tor hidden services are good starts. Here, i'm going to propose two new half-baked patterns, as alternatives to the above, that might fit in to the hyper-world.

The first idea isn't very solid (but sets up the second), and I think it's been raised previously: to store a public identity key in a multi-writer hyperdb, where each of the user's end devices has a separate writer key. Other keys (or at least fingerprints), a profile image, and "impressum" type contact/profile details could be stored here; most importantly it would be a way to revoke/update a primary key-pair (by updating a public key entry in the db). Multi-writer doesn't really pull off the device key revocation problem well though, so this is a hand-wavey solution. Let's call the key in the hyperdb the "public persona key".

The second idea is to have a mechanism for discovering and initiating a connection to one of a persona's devices. The way this would work is that the "sender" (peer initiating the connection) would need prior knowledge of the "recipient" peer's public persona keys. They would look up the currently active key from the identity hyperdb (note: distinct from the "original"/"source" key in the hyperdb itself, which would be longer-term stable), then generate a discovery key using a protocol name ("hyperpeer" or something, as opposed to "hypercore") and, optionally, a "code word" passed out-of-band from the "recipient". The code word would be used to control initiation access, similar to how unlisted phone numbers or private email addresses work today (in particular, the "[email protected]" pattern): an individual may or may not want to let just anybody initiate connections, so they might only listen on a set of (rotating?) code words, that might be printed on business cards or whatever. Others might want to allow public connections; "recipients" would control which discovery keys to actually broadcast and listen on. Annonymous/ephemeral connections can be made by generating a key-pair with no associated "identity" hyperdb. Any trusted recipient device (holding the public/private key pair) can receive connections (making the chance of being online higher); "sending" peers would need to retry until they succeed. Only "receivers" would broadcast to be discoverable; the code word mechanism above couuld make DDoS sybil attacks harder (aka, censoring access to a persona; not sure how to prevent this for wide-open personas). Once a peer has been discovered, authentication would be established using the key-pair on the "recipient" end. Likely the sender could supply their own persona key during handshake for mutual authentication. The connection would be encrypted similarly to hypercore, though the message format may be entirely different beyond that. Both generic TCP-like streams (multiplexed a la websockets?) or protobuf-like RPC (request/response) could be sent down the pipe; either generalizes to the other.

Regular hypercore could be tunneled over such a channel. In particular, for RPC or other connections that want to be robust to network disconnects, the peers could immediately establish two private hypercores (one for each direction of transfer) and send messages by appending them. In the other direction, we could try to define a datagram-like transport over UDP (eg, for voice/video applications).

bnewbold avatar Jul 12 '18 17:07 bnewbold

+1 to starting this discussion, thanks @bnewbold

Regarding the first idea (key distribution), we might consider using org/group dats for identity rather than individual dats. For instance, dat://bluelinklabs.com to facilitate a shortname of [email protected] instead of dat://pfrazee.com. If we find multi-writer is a bad basis for key revocation and so can only use single-writer, an org/group dat would be more feasible since it would likely be managed on a server and not require multidevice support. Additional authentication signals (eg WoT) could be layered on top of the identity provider dats, and anybody is free to create an org/group dat for themselves.

Regarding the second idea (authenticated connections), I think there are a number of interesting ideas here that are worth considering. I need to think much more about the target UX and use-cases to comment on the details. What are the use-cases & flows we must support, and what are the risks we must mitigate? Perhaps we could start the conversation there and work backwards.

Leaving this before I forget about it: I liked a lot about the properties of dominic's secret-handshake protocol and would suggest that's given a look. It relies on keys being exchanged & authenticated prior to the connection, which means that no identity information should be leaked by a connection attempt.

pfrazee avatar Jul 12 '18 18:07 pfrazee

The session and ephemeral messaging extensions are mostly what i'm responding to, but here are some more specific use cases:

  • requesting write-access to a public multi-writer hyperdb feed (eg, a cabal chat room). Related (sort of equivalent), a way to "propose" changes/patches to a feed
  • "registration" and "push" mechanisms for datbase.org, hashbase, and other dat hosts/archives, including account sign-up (as an alternative to an http web interface)
  • mechanism for "ping backs" or other first-time event transmission (aka, you responded/annotated somebody's hypercore content and want to let them know about your content with low-latency, and you don't have any other communications channel with that person)
  • establishing real-time communication with a peer (low-latency direct chat, audio, video; a la webrtc)
  • direct control of "things" with low-latency (eg, robotics, coffee machine): you could configure the device to subscribe to a hypercore and append requests to that, and subscribe to a response/acknowledge feed, but you might want to have a more direct connection (stream/circuit style) for low-latency control. I guess this is redundant with the real-time communication thing above
  • verification of feed replication status from specific devices/parties. Eg, maybe you want to delete photos from your phone. you've already pushed them to a hyperdrive feed, but you want to double check that your home server and laptop have fully-replicated copies first. you need to make a hypercore connection to specific devices (or personas), not just "any swarm member".
  • traditional payments (credit card or bank account, not cryptocurrency): you're not supposed to log/track/retain "payment card information", as a regulatory (and security) issue. Need a way to send authenticated, encrypted, secure messages to specific parties. Maybe cryptocurrencies will become stable/useful enough to use in non-toy economies soon, but i'm not holding my breath.
  • transfer of trust. eg, scan a QR code to get your phone and laptop to talk directly and cross-sign or transfer a private key (harder to have your laptop scan a QR code from your phone, though sometimes technically possible)

To me, the "push new feed to repository", "submit a patch/change/multiwriter access", and "establish bi-directional real-time communications" patterns are the most important. These are existing patterns people need to do, and currently we just hand wave "do it out of band", which usually means "use traditional HTTP". I'm not trying to entirely burn down the HTTP world all at once (it'll be around for the long haul, I think), but incrementally provide alternatives. I'll note that this proposal is pretty similar in "features provided" to tor hidden services, which I think has much better security/privacy properties. I'll also note that I don't have any intention to implement this ("working code") myself any time soon, just spit-balling. I'll check out secret-handshake, thanks for the reference.

bnewbold avatar Jul 12 '18 21:07 bnewbold

cc: @pvh

bnewbold avatar Jul 12 '18 21:07 bnewbold

The stuff you're describing seems more like a fundamental building block that you would run dat over instead of the other way around.

RangerMauve avatar Jul 12 '18 22:07 RangerMauve

I guess I think about this stuff a little differently because I live down in hyper* land, but hyperdiscovery gives me some of these properties already, and the remaining bits appear to be things I can build on hyperdiscovery.

To wit, in my conception, an identity is a document defined by a discovery key. In my case, the document can have multiple writers. The discovery key can't be guessed, and although we have not implemented it yet, my intention is to have each client able to validate that all contributing authors to an identity document had some kind of signing-chain keybase-esque linked authority to contribute to said document.

But I'm getting a bit ahead of myself.

Before getting too down into the weeds of what dat, or hypercore, or some other library might want or need to implement, can we define terms & use cases a little bit here?

The identity system you need for a publication platform is different from a collaboration one, and the threat models you're protecting against if you're trying to smuggle state secrets are different from those I use to secure my Twitter account.

I'll go first, in the hopes that it might help to frame the discussion.

I'm building applications that let small numbers of people collaborate on personal information. It's important to me that their privacy is respected and that they can have a reasonable expectation that the data they're consuming is from the people they think it is, and that misbehaving clients can't harm the system.

I'm okay with a privacy model analogous to what HTTPS offers -- an observer can see what destinations you're connecting to and roughly how much data you're sending around and to whom -- but cannot read or manipulate the contents.

Identities serve to provide useful information about the person you're collaborating with. Just as with gmail I could register a new account, set the name to Peter van Hardenberg, and start emailing people and you'd probably believe it was me without further proof... I'm comfortable with the risk of spoofing identities and I'm not trying to provide robust mapping of "accounts" to people.

I don't have any particular objection to stronger guarantees, and indeed, I'd be delighted to have them, but if I'm building for myself (and I am) I'll probably be satisfied with a solution that meets these criteria.

pvh avatar Jul 13 '18 06:07 pvh