stacks icon indicating copy to clipboard operation
stacks copied to clipboard

Blog Post: DHT

Open pstan26 opened this issue 7 years ago • 3 comments

Sounds like @jcnelson is 60% done -- go Jude!

Once we have it completed, let's plan to:

  • [ ] get public reviews from P2P experts (Frans etc)
  • [ ] Request comment from Mike Freedman, potentially Petar, and others

Goal here is to create one post to clarify fundamental limitations of DHTs. We need to get reviews and request some lines be made public. There are many decentralized storage systems looking for a solution -- the fundamental DHT issues apply to any project using them.

Maybe we can add bullets on the post here or link to the blog post, @jcnelson ? Is it reasonable to aim for April 10th release?

pstan26 avatar Mar 16 '17 19:03 pstan26

Here's what I have so far:


One of the more interesting innovations in Blockstack is in how it distributes the set of zone files to all peers.
Like several other decentralized Web infrastructure projects, the original system used a DHT (distributed hash table). This was acceptable for building a prototype implementation while the network was small and mostly-reliable. However, this would have been unacceptable in the long-term due to the stability and security problems inherent to DHTs. Since November 2016, we have been distributing zone files using a decentralized storage system built from first principles called the Atlas Network.

DHTs: Unsafe at any Speed

(This was previously discussed here, and in the middle of this blog post).

DHTs came into being in the 1990's and 2000's, and were among the first scalable decentralized data stores. The most widely-used DHTs power file-sharing software like BitTorrent and eMule. These are based on the Kademlia DHT, and enjoy widespread usage today.

Easy to Misuse

Since the mid 2000's, the distributed systems research community has learned several hard but important lessons about the usability of DHTs. In particular, DHTs are only useful if all of the following are true for your application:

  • You can't store a 100% replica. Your application stores so much data that it is unreasonable to try to keep a full copy on any single node.

  • You will tolerate arbitrarily slow I/O. Due to the way DHTs handle requests, reads and writes have very variable latency. Some incrmenetal work on DHTs (like DSHTs) try to fix this for frequently-requested data, but the long-tail performance is still terrible. This is inherent to DHT routing protocols; a pathological request can needlessly bounce through several high-latency network links before being handled.

  • You will periodically re-announce data. Public-access DHTs allow anyone to write to them. To deal with so much data, they simply delete stale data to relieve memory pressure. At the same time, nodes can come and go as they please, which can cause key/value pairs to get dropped. To deal with this, applications need to re-replicate data every so often to keep it available.

  • You will never change a key/value pair. DHTs can split into one or more disjoint networks due to partitions, and re-join later on. While split, a client can write conflicting key/value pairs to nodes on either side of the partition. This leads to externally-visible inconsistency--some clients will see one value for the key, and other clients will see a different value. To avoid this, the application must ensure that there is at most one value per key (i.e. data is immutable once written).

These are difficult demands to meet. Each application process effectively needs to be online 24/7 to ensure its data is available, and even then the developer needs to adress data consistency and I/O performance via some other way.

The reason BitTorrent and eMule can get away with using DHTs is because (1) they use them only for "soft state," and (2) they do not rely on them for correctness. Specifically, they use a DHT to store tracking and routing information for a particular file.
This information is small by comparison to the file, and can be re-created quickly by any peer. There are no consistency concerns since since the .torrent file contains all the chunk hashes, and there are no negative consequences of receiving invalid tracking data since the hashes are known to all peers beforehand. Peers also do not share data via the DHT, but connect to one another directly.

Hard to Secure

The difficulties do not end there, though. DHTs are still vulnerable to three types of attacks that can harm even the most resilient DHT-powered application.

  • Endless Data DDoS. Without some rate-limiting or access-control mechanism, a DHT has no way to limit the amount of data inserted. An adversary can flood the DHT with lots of garbage data and knock nodes offline. (This never applied to Blockstack, since our DHT only accepted key/value pairs whose hashes were written to the Bitcoin blockchain).

  • Hash-Bucket Takeover. By joining with the right nodes in a DHT, a Sybil node can take over responsibility for specific hash buckets in the network. In doing so, the Sybil will both be able to see who's asking for the data (a privacy concern), and will be able to censor the data by ignoring or slowing down requests (an availability concern). A Sybil node can also do extensive damage to a running DHT by repeatedly joining it, taking over heavily-requested buckets, and then dropping them.

  • Route-table Takeover. Suppose that the DHT let the developer "pin" hash buckets to honest nodes to thwart hash-bucket takeovers. This does not fix the availability problem, since a Sybil can insert itself as the immediate neighbors of the honest node and prevent the rest of the system from discovering it. This works because DHT nodes do not have a global view of which nodes have which buckets. Instead, each node keeps a short list of which nodes host buckets that are "close to" its buckets in the key space. If the Sybil node can become an honest node's neighbors, then it can prevent other nodes from discovering the honest node's data.

jcnelson avatar Mar 16 '17 20:03 jcnelson

Thanks for posting Jude! I have a bunch of (private) notes on this and can take a crack at expanding the draft.

muneeb-ali avatar Mar 16 '17 20:03 muneeb-ali

Doesn't look like this made it into a blog post. @jcnelson @pstan26 Would this still be useful?

jackzampolin avatar Jan 10 '18 22:01 jackzampolin