rust-libp2p Sybil defence

Description

Sorry to bother you. I am wondering how much, if any, of this paper https://ssg.lancs.ac.uk/wp-content/uploads/ndss_preprint.pdf is implememented in this crate?

Much of the paper seems to make sense and I belive the go impl has made strides in this direction.

Motivation

Sybil defence at the network layer would benefit every decentralised project and allow a focussed approach.

Current Implementation

I am not familiar enough with the codebase to infer any insights here.

Are you planning to do it yourself in a pull request ?

No

Oct 30 '23 09:10 dirvine

I don't think it is a valid topic here in libp2p. libp2p is built heavily around connections, not the application and network built on top of it.

Nov 01 '23 03:11 drHuangMHT

Thanks for posting this. Can you summarise what the mitigation strategy presented in the paper is? I tried to skim it but couldn't find a concise summary.

Nov 01 '23 09:11 thomaseizinger

@drHuangMHT I understand your position, however this approach provides an API addition that upper layers can use for effective sybil defence. If we make this happen, the man projects could benefit from a network layer approach.

@thomaseizinger I have used Claude2 to help me summarise this. I find it more effective than a human.

Here is a summary of the Sybil mitigation strategy described in the paper:

The attack exploits a vulnerability in the Kademlia DHT used for content resolution in IPFS. An attacker can generate multiple Sybil identities that are closer to a target content ID (CID) than any honest peers. This allows the attacker to control the resolvers for that CID and prevent discovery of the content.
The mitigation strategy involves using region-based queries instead of contacting only the k closest peers to a CID. The region size is calculated to contain k peers on average, based on an estimate of the total network size.
With region queries, providers store records on and requesters retrieve records from approximately k honest resolvers, even if the attacker adds Sybil nodes.
The mitigation mechanism is only activated when the attack is detected using a statistical test based on the KL divergence between actual and expected peer ID distributions. This avoids overhead when no attack is present.
Detection uses the peer IDs returned for a CID to check if they appear suspiciously close to the CID, indicating potential Sybil nodes. No additional messages are required.
The mitigation technique ensures content discovery with high probability for any CID under attack. It increases costs for attackers linearly with the number of Sybils they introduce.
The solution can be deployed incrementally, requires no changes to the core DHT protocol, and maintains interoperability with existing IPFS nodes.

To implement the Sybil mitigation strategy in rust-libp2p, the main changes would be:

Add support for region-based queries in the Kademlia DHT implementation. This involves finding all peers within a region of a certain distance from a key, rather than just the k closest peers.
Implement the network size estimation mechanism. This is used to dynamically calculate the region size for the number of expected honest peers k.
Add attack detection using KL divergence of peer IDs. Can be implemented as part of the lookup for provider records.
Activate region queries only when attack is detected. Use normal lookup otherwise.
Provider and requester logic to use region queries when mitigation is active.

Some specifics:

The region query can be implemented via iterative calls to get_closest_peers() to find peers matching more prefix bits.
Network size estimation calculates model distribution of distances to estimate size.
KL divergence computes distance between empirical and model peer ID distributions.
Detection runs on providers and requesters. Can use a shared threshold.
Mitigation queries contact more peers, increasing costs for attackers.

This achieves censorship resistance for target CIDs under attack by ensuring providers and requesters communicate through regions guaranteed to have honest peers. The changes maintain compatibility with existing peers. Region queries are only used when needed to limit overhead. Overall, this provides an effective and practical solution to the DHT vulnerability.

Nov 01 '23 09:11 dirvine

As a follow up on KL divergance, I hope this is useful

The Kullback-Leibler (KL) divergence is a statistical measure used to quantify the difference between two probability distributions. It can be used to detect if an empirical distribution of samples differs significantly from an expected theoretical distribution.

In the context of detecting Sybil attacks in DHTs, here is an explanation of how KL divergence is used:

Let p(x) be the theoretical probability distribution of common prefix lengths between peer IDs and a target CID, assuming uniform random peer IDs. This can be calculated based on the estimated network size.
Let q(x) be the observed empirical distribution of common prefix lengths from the peers returned for a lookup for that CID.
If there is no attack, q(x) should closely match p(x). But under an attack with Sybils, q(x) will diverge from p(x).
The KL divergence D(q||p) quantifies the difference between the two distributions:

D(q||p) = Σx q(x) * log(q(x) / p(x))

It measures the information lost if p is used to approximate q. Higher values indicate larger divergence.
If D(q||p) is greater than a chosen detection threshold, we conclude the distributions are significantly different, indicating a likely Sybil attack.
The threshold balances false positives and negatives. A higher threshold leads to fewer false positives but more missed attacks.

So in summary, KL divergence provides a principled statistical test to detect Sybil attacks by quantifying the mismatch between the actual and expected peer ID distributions, without needing direct labels about which peers are Sybils.

Nov 01 '23 09:11 dirvine

Thank you for this!

Region-based queries are not implemented in libp2p-kad and I don't know of anybody that is planning on implementing them. Happy to mentor you on the codebase though if you want to take a stab at it :)

@mxinden as the original author of iibp2p-kad should likely weigh in on this too.

From what I understand, this is entirely an implementation-detail (modulo perhaps a config parameter for the divergence threshold).

Nov 01 '23 09:11 thomaseizinger

It's a pleasure. Thank you for the mentor offer. I am time limited, but I will ask the guys (maidsafe) and see if we can line up some resource when we see how Max feels about this as well. It also may be a nice one to fund as a grant of some kind. I think the freenet guys would also benefit. @sanity may also be interested in this approach.

Nov 01 '23 10:11 dirvine

@dirvine sorry for the delay. Thanks for starting this conversation.

Much of the paper seems to make sense and I belive the go impl has made strides in this direction.

@dennis-tra given your familiarity with the Go implementation and your recent work on it, can you add more details here?

I am wondering how much, if any, of this paper https://ssg.lancs.ac.uk/wp-content/uploads/ndss_preprint.pdf is implememented in this crate?

As Thomas said above, today neither the detection nor the mitigation mechanism is implemented in rust-libp2p. Unfortunately I am also not aware of any out-of-tree implementation in Rust.

this is entirely an implementation-detail

Agreed. This simplifies the implementation process significantly, given that no coordination with other peers is needed.

I am time limited, but I will ask the guys (maidsafe) and see if we can line up some resource when we see how Max feels about this as well. It also may be a nice one to fund as a grant of some kind.

Open for contributions! Would be great to see an implementation in Rust. Note however that I would only consider merging it when the gained security is worth the complexity it comes along with.

(@dennis-tra since you know the authors, they might enjoy the interest here.)

Nov 20 '23 20:11 mxinden