asmap should be used for relay selection
Problem
With OHTTP opt-in from arbitrary relays, the sender and receiver's choice of relays and directory is independent.
E2EE messages flow along this path:
graph LR
sender --- ohttp_relay_1;
ohttp_relay_1 --- directory;
directory --- ohttp_relay_2;
ohttp_relay_2 --- receiver;
unfortunately if both the directory and relay(s) are in the same autonomous system, this can weaken OHTTP privacy, which is already pretty lenient. for example suppose two independent service operators both choose the same cloud provider:
graph LR
sender --- ohttp_relay_1;
subgraph AS
ohttp_relay_1 --- directory;
end
directory --- ohttp_relay_2;
ohttp_relay_2 --- receiver;
this can be used to link users to specific (encrypted) messages on the directory.
this is also the case if a user and the directory share an AS.
graph LR
sender --- ohttp_relay_1;
ohttp_relay_1 --- directory;
directory --- ohttp_relay_2;
ohttp_relay_2 --- receiver;
subgraph AS
sender;
directory;
end
a similar but more concern arises if both relays are in the same AS, then traffic analysis can link the sender and receiver, even if the directory and users are different ones:
graph LR
sender --- ohttp_relay_1;
ohttp_relay_1 --- directory;
directory --- ohttp_relay_2;
ohttp_relay_2 --- receiver;
subgraph AS
ohttp_relay_1;
ohttp_relay_2;
end
unfortunately in the (presumably not uncommon) case of two mobile users in the same physical location sharing the same mobile provider, this is still inherently linkable through traffic analysis. VPNs can mitigate this especially if there is cover traffic, and it helps that BIP 77 is fairly low bandwidth, but this traffic analysis concern is inherent.
graph LR
sender --- ohttp_relay_1;
ohttp_relay_1 --- directory;
directory --- ohttp_relay_2;
ohttp_relay_2 --- receiver;
subgraph AS
sender;
receiver;
end
Proposed Mitigation
Therefore it seems appropriate to consider providing code that can help wallet developers use https://asmap.org/ to select servers.
Given a list of trusted directories and relays, these lists should be filtered to exclude servers that share an AS with the user. This is potentially tricky as it users to be able to determine their public IP(s).
All else being equal, a receiver should select a directory at random from the trusted filtered directories. Both sender and receiver will subsequently filter their list of trusted relays to additionally exclude the AS of this randomly chosen directory.
The receiver's ephemeral key is then generated. This key can be hashed with a tagged hash for domain separation, to generate a random seed shared by both sender and receiver. Let h be a hash function tagged by this random seed.
Both sender and receiver should sort their individual filtered list of relays according to the following rules:
- group relays by AS
- shuffle the AS buckets by
h(ASN)- open question: for relays resolving to multiple IPs with different ASNs, what should be done? the simplest and most conservative thing is to just reject these entirely and discourage such deployments, only allowing one each of A and AAAA entries and strongly suggest favoring ipv6 in the transport layer?
- shuffle the relays in each bucket by
h(relay uri) - cycle through the buckets popping relays one by one to obtain the receiver's total ordering which is well defined (up to hash collisions) over the union of both user's sets of trusted relays
- sender only: reverse the ordering
Both parties should then select relays according to this ordering.
Note that payjoin-cli's RelayManager currently uses the thread RNG to effectively shuffle the list of relays, apart from excluding the relay's AS, this idea just changes this proposed ordering to be deterministic so that both users can more reliably avoid picking a relay from the same AS.
If both parties do this, the worst case scenario is still that they both share an AS and both pick a relay in the same AS:
graph LR
sender --- ohttp_relay_1;
ohttp_relay_1 --- directory;
directory --- ohttp_relay_2;
ohttp_relay_2 --- receiver;
subgraph AS2
ohttp_relay_1;
ohttp_relay_2;
end
subgraph AS1
sender;
receiver;
end
However if they do not share an AS and they have sufficiently large lists of relay, regardless of the degree that those lists overlap they can maximize their chances of selecting 3 distinct AS's for the route between them.
This should be an optional feature, and could be an entirely separate crate, or my preference would be as an optional feature in the spirit of the io feature, but a proof of concept can start as just a payjoin-cli feature that.
more of a math nerd question, doesn't seem that important, but is bucketizing like that actually optimal or close to optimal for avoiding the same AS? seems a bit tricky to analyze but maybe there's a simple combinatorial way of arguing for collision minimzation?