CIPs
CIPs copied to clipboard
CIP: TipSync
cip: 75
title: TipSync
author: Joel Thorstensson (@oed)
discussions-to: https://github.com/ceramicnetwork/CIP/issues/75
status: Idea
category: Standards
type: Core
created: 2020-11-15
Simple Summary
A scalable approach to syncing stream tips in Ceramic using a libp2p protocol and the libp2p DHT.
Abstract
By utilizing the libp2p DHT along with a new libp2p protocol (inspired by bitswap), we design a system that allows a peer to find all other peers that currently pin a given stream and exchange tips with them. This is achieved by each peer telling the network which streams that they pin using the DHT in combination with a protocol for querying tips from any peer is introduced.
Motivation
Currently Ceramic uses a libp2p pubsub topic to publish and query tips of all streams in the network. As the number of queries in the network is expected to be very large it's expected that this approach will soon face scalability issues. This CIP suggest an alternative approach to query stream tips in order to mitigate the issue.
Specification
The TipSync protocol consists of two components: TipExchange itself and TipDiscovery. The former describes a libp2p protocol for querying tips from connected peers, the latter how to discover peers that hold the tip of any iven stream.
TipExchange
TipExchange is a libp2p protocol with the following protocol id:
/ceramic/tipx/1.0.0
The algorithm can be described in two simple steps:
- When a Ceramic peer want to query a specific StreamId it sends a
want-tip
message to all of its peers. - Peers that currently pin the given stream respond with a
have-tip
message along with the CID of the tip they have.
In the graphic above Peer A sends a
want-tip
message to Peer B,C,D. Peer B,D has the given stream pinned and thus responds with the tip. In this case they respond with different CIDs (the reason they are out of sync and how that is resolved is out of scope here) and its now up to Peer A to do conflict resolution.
Message formats
The structure of the want-tip
and have-tip
messages are specified below.
interface StreamQuery {
stream: string
paths?: Array<string>
}
interface WantTip {
typ: 3
id: string
streams: Array<StreamQuery>
}
interface TipMap {
[docid: string]: string
}
interface HaveTip {
typ: 4
id: string
tips: TipMap
}
TipDiscovery
The TipExchange libp2p protocol described above is great for getting the latest tips from already connected peers. However, the given stream might be pinned on a peer which we are not connected to. The TipDiscovery protocol uses the libp2p DHT to find all peers that pin any given stream. The basic idea of the DHT peer lookup is simple. When a Ceramic peer pins a stream it tells the DHT that they provide this stream. They also look up all other nodes that are providers of this stream and query them for the latest tip of the stream.
The libp2p DHT can be used to announce to the network that your node provides content for a given CID. In the ipfs network this is primarily used to signal that you hold the data of the given CID. However, we can create a CID that represents the StreamId of a Ceramic stream and thus have a way signaling which Ceramic peers pin any given stream.
Representing a StreamId in the DHT
To represent the StreamId as a CID we simply use the identity multihash along with the raw
multicodec, then simply put the bytes of the StreamId after that. The resulting CID bytes should be constructed like follows:
<CIDv1-multicodec><raw-multicodec><multihash-multicodec><StreamId-length><StreamId-bytes>
0x01 | 0x55 | 0x00 | <StreamId-length> | <StreamId-bytes>
Providing the document
Use the DHT Provide method to provide the CID representing the StreamId. The timeout
option should be set to a reasonably short time interval since there is no way to manually remove the DHT record. A Ceramic peer should republish the DHT record before the timeout ends, given that the stream is still pinned.
Finding providers of the document
Use the DHT FindProvs method to look up peers that provide the given stream. Connect to each (or a subset of) the found peers and send the document lookup query to them.
Querying a stream
The full algorithm for querying a stream would look something like this:
- Run the TipSync protocol on currently connected peers
- Traverse the DHT to find all peers pinning the stream
- Connect to peers as they are found and run TipSync with them
Note that we can't be completely sure that we have the most up to date state of a stream before our peer has connected to and run the TipSync protocol with all peers which pin the given stream. However, it might be reasonable to optimistically respond to a query before that.
Open questions
- A node that keeps track of a lot of documents would now potentially need to connect to more nodes, however nodes can be more certain that they can find the latest state of documents. Is there a trade off to be made here?
- When publishing updates to a given document, should nodes just push this update to peers that care about it? Probably keep publishing to the Ceramic pubsub topic for now.
Future work
- Extend the protocol to stream commit data from peers that have responded with a
have-tip
message. This could significantly improve performance of Ceramic
Rationale
Rationale goes here.
Backwards Compatibility
This feature needs to be rolled out in stages. First adding DHT publishing and support for responding to queries. The want-tip
message over TipSync is however gated behind a feature flag. Once some time has passed, e.g. a month, such that most nodes have upgraded the feature can be turned on in a new release.
Ceramic peers still connect to the Ceramic pubsub topic to publish updates to streams. They can also respond to queries made by older nodes for some period of time until this old query method is completely phased out.
Implementation
No implementation yet.
Security Considerations
To be completely sure that a query results in the latest state of a stream the query protocol must find all peers in the DHT and get the tip from them. Even if one peer is left out that peer may have a more recent update (even if this is unlikely). In order to improve this situation over time it can make sense to consider the TipSync protocol in parallel with the way tips are published on updates. Note however that some tradeoff may be possible where the result of the query is returned before we have a result from all peers that pin the given stream.
Copyright
Copyright and related rights waived via CC0.
When a Ceramic peer pins a stream it tells the DHT that they provide this stream. They also look up all other nodes that are providers of this stream and query them for the latest tip of the stream.
This would presumably have to happen at node startup as well, right? So we'd have to iterate over the entire pin store at startup to inform the DHT of what streams we are providing.
The timeout option should be set to a reasonably short time interval since there is no way to manually remove the DHT record. A Ceramic peer should republish the DHT record before the timeout ends, given that the stream is still pinned.
This would also have to be done for every stream in the pinset. Could get expensive if a node has many streams pinned?
The want-tip message over TipSync is however gated behind a feature flag. Once some time has passed, e.g. a month, such that most nodes have upgraded the feature can be turned on in a new release.
We might not need to do this. At the cost of some extra bandwidth, we could have a period of time where we simply use both query protocols simultaneously and consider any tips found via either lookup approach. Eventually we could start logging warnings if the pubsub lookup is finding better tips than the libp2p lookup. Once no one is seeing those warnings anymore we can do a release that turns off pubsub lookups by default
This would presumably have to happen at node startup as well, right? So we'd have to iterate over the entire pin store at startup to inform the DHT of what streams we are providing.
Yes, good point.
This would also have to be done for every stream in the pinset. Could get expensive if a node has many streams pinned?
I believe publishing to the DHT is quite cheap. IPFS peers already do this for all of the CIDs which they pin.