zips
zips copied to clipboard
ZIP: tor transaction relay network - an in-progress standard for improved network privacy via auxillary protocol.
This branch and PR build atop of #389.
Some security/privacy considerations:
- The different protocol actions, like sending versus receiving transactions, have different network traffic sizes/patterns. So even if a light wallet uses Tor, their ISP can still tell when they're sending/receiving, which they can use to link transactions (by timing across multiple users that they're surveilling), or confirm whether or not a zaddr belongs to a particular user, and probably some more attacks. One way to fix that is to use a constant-bandwidth protocol, with the obvious drawback it uses too much bandwidth.
- A similar consideration applies to zcashd nodes: even if they're all using Tor, a global passive adversary can still tell which nodes are sending which transactions (by correlating the timing of traffic patterns that look like tx sends with appearances of new transactions in the mempool).
- Light clients can be "watermarked" by the service provider. One not-very-scalable way to do that is to always keep the targeted user on an even block height and keep everyone else on an odd block height. When that user's wallet reconnects, you know it's them because they start asking from an even height, and everyone else would start asking from an odd height. Other changes to the protocol will make that even easier. For example, to save bandwidth, wallets don't want to fetch the entire mempool every time they poll it, so they'll remember some state about which mempool transactions they have, and return (some information about) that state to the service provider with each query. The service provider can watermark anything the wallet remembers and returns information about in future queries.
- The number of simultaneous users of the service provider matters. Even if everything's coming in over fresh Tor circuits, if there's only one active user at the time, then you can trivially link their z2z transactions with their t2t transactions that come in around the same time and do analysis of the transparent blockchain to deanonymize them.
1 & 2. All supported versions of Tor have connection padding, we should check if Tor's default padding settings provide sufficient cover for Zcash transactions.
If they don't, we should try to get Tor's defaults changed. (Otherwise, if we changed the padding settings for Zcash, we'd create a distinguishable class of Zcash users, even if they are individually indistinguishable.)
Tor also has circuit padding, but last time I checked, they had only implemented experimental padding for onion service circuit setup. (If Zcash needs generic or Zcash-specific circuit padding, it could be developed via a foundation or major grant.)
1, 2, & 4. Zcash might also need to pad its network communication over Tor, or randomise the timing of transactions.
Ideally, Zcash should increase the overall transaction volume to provide cover traffic, but that's a harder problem.
Just a general comment about the context of this ZIP:
First of all, this is a "hobby / self-interest" project on my part, not an @Electric-Coin-Company. This is because at ECC we believe there are multiple higher priorities for Zcash to help more people sooner than improving the current network privacy. But the current network privacy definitely needs improvement!
Second, I don't have super high confidence that this protocol draft will be workable or helpful. It might. I will be satisfied if it always remains a "strawman design" that newer projects point to and can say "well, this project is better than "TTRN v0.1" because of X. However, I do believe there's potential for this protocol approach to become used as a prototype and it may even become useful for most or many Zcash users, so I'm going to keep collaborating to refine it up until something clearly better comes along.
Third: if this protocol has a clear threat model and it's likely to provide some privacy benefit to say ~50% or 80% of users or use cases, and it was an acceptable expense for 80-90% of users, then I would advocate making it "standard" for all wallets, precisely because making it standard would greatly benefit privacy.
Next a separate post about privacy model.
Privacy / Threat Model rough/rushed thoughts:
Much thanks to @defuse 's comment for prompting me to clarify threat model:
- The different protocol actions, like sending versus receiving transactions, have different network traffic sizes/patterns. So even if a light wallet uses Tor, their ISP can still tell when they're sending/receiving, which they can use to link transactions (by timing across multiple users that they're surveilling), or confirm whether or not a zaddr belongs to a particular user, and probably some more attacks. One way to fix that is to use a constant-bandwidth protocol, with the obvious drawback it uses too much bandwidth.
This threat model does not attempt to protect a user's privacy at the network layer in any way that Tor doesn't already protect. Thus if ISP can observe traffic patterns to make some identification/determination of Tor traffic generally, this protocol won't help.
I think this protocol attempts to help protect wallets against:
- malicious Zcash peers determining their "peer of origin", whether than be their direct full node, or their mobile/light wallet service provider's node.
- malicious mobile/light wallet service providers identifying which shielded transactions are sent by their user's wallets.
- A similar consideration applies to zcashd nodes: even if they're all using Tor, a global passive adversary can still tell which nodes are sending which transactions (by correlating the timing of traffic patterns that look like tx sends with appearances of new transactions in the mempool).
Privacy/Threat Model does not protect against adversaries which Tor does not protect against, including global passive.
3. Light clients can be "watermarked" by the service provider. One not-very-scalable way to do that is to always keep the targeted user on an even block height and keep everyone else on an odd block height. When that user's wallet reconnects, you know it's them because they start asking from an even height, and everyone else would start asking from an odd height. Other changes to the protocol will make that even easier. For example, to save bandwidth, wallets don't want to fetch the entire mempool every time they poll it, so they'll remember some state about which mempool transactions they have, and return (some information about) that state to the service provider with each query. The service provider can watermark anything the wallet remembers and returns information about in future queries.
This is entirely about service providers learning which incoming transactions a wallet is interested in. This protocol doesn't interact at all with incoming transactions scanning. The weakness here still applies.
What this protocol changes is that in the current lightwalletd
status quo, when wallets want to send a transaction, they contact their service provider, which already has a lot of insight into which transactions the wallet receives, and then asks the service provider to relay that transaction. The wallet provider has unambiguous linkage of which wallets send which txns.
With this protocol, wallet service providers no longer have that information! Instead random TTRN relays have that information. (So an attacker might obviously try to link metadata from both kinds of services to compromise this security goal.)
4. The number of simultaneous users of the service provider matters. Even if everything's coming in over fresh Tor circuits, if there's only one active user at the time, then you can trivially link their z2z transactions with their t2t transactions that come in around the same time and do analysis of the transparent blockchain to deanonymize them.
There is no way for either a hosted wallet provider or an TTRN provider to determine how many users use this system (by design; if there is, it's a vuln against the threat model).
I'm making some huge and important assumptions here that contradict the notion that this is a "purely auxillary opt-in protocol":
First, all hosted wallets themselves would need to have a copy of the active TTRN roster. I guess I was imagining that wallet service providers always provide that to every wallet on every connection (so that it can't tell which wallets want to use that list). Problem: how can wallets know the list is complete?
One thought: they subscribe to the TTRN Wishing Well Viewing Key just like they subscribe to their own private addresses with the wallet provider. For this to provide privacy as to which users use TTRN, all wallets would need to make this subscription. This is still weak, because a service provider might try to drop some TTRN Registrations. (If Viewing Keys provided some completeness guarantee that could help. I don't think they do and I haven't read ZIP 310 yet.)
Second: I'm just ignoring all transactions except Sapling z2z. I haven't yet started thinking about transparent.
Third: I'm assuming other ecosystem wide "standardizations" besides just lightwalletd
deployments. For example, @teor2345 brought up the fact that TTRN txns have an extra shielded output. I would be inclined to address that by saying "all conforming wallets always include at least three shielded outputs: one for recipient, one for change, and one for TTRN. If any one of those cases is absent, the conforming wallet generates a dummy shielded output."
Obviously these ecosystem-wide changes are big rocks to move. So maybe there would be a "prototypical" phase of this protocol for experimentation and then a later "production deployment"?
Does this clarify the contours of this protocol approach somewhat? Is it still feasible? Can we refine it to something useful?
(ps: I have not yet read Tor's Guard Design linked from https://github.com/zcash/zips/pull/391#discussion_r469590686 . )
Light clients can be "watermarked" by the service provider. One not-very-scalable way to do that is to always keep the targeted user on an even block height and keep everyone else on an odd block height. When that user's wallet reconnects, you know it's them because they start asking from an even height, and everyone else would start asking from an odd height. Other changes to the protocol will make that even easier. For example, to save bandwidth, wallets don't want to fetch the entire mempool every time they poll it, so they'll remember some state about which mempool transactions they have, and return (some information about) that state to the service provider with each query. The service provider can watermark anything the wallet remembers and returns information about in future queries.
@defuse can a light wallet server watermark the transactions that a lightwallet sends using this approach? Is there a way to give a user's sent transactions some unique property by changing what transactions they receive?
To answer my own question, if you're party to a given transaction, one way would be to omit delivering certain transactions to a given light wallet user and see if their response implies that they didn't receive those transactions. But I don't know if this could be used for linking?
@nathan-at-least to understand this better, I think we should build up from simpler changes and look at the privacy gain (and potentially loss) at each stage, to get a better idea of how this fits in.
Given your note that ECC has higher priorities, this also might be helpful for identifying lower-hanging fruit solutions that would fit into ECC's roadmap in the near term.
For example:
- What if there existed a lightwallet server run by a credibly secure entity, like Infura or Cloudflare's version of Infura?
- What if it was the suggested default for all light wallet users, i.e. opt-out not opt-in?
- What if the default full node behavior was to send its transactions to this server too?
- What if this service was run by individuals with clear values that aggressively purged logs, promised to do so in a data retention policy, had a warrant canary, and followed other best practices?
- What if users can choose the service they trust from a short list of approved providers? (You could choose an organization like Riseup, like Signal, or like Cloudflare, depending on your threat model, and services could be delisted if their userbases were too small.)
- What if there was a relayer incentive like the one you propose, to make it easier for small organizations, like Riseup, to run such a service?
- What if you connect to the chosen organization over Tor
- What if you use a new Tor circuit each time?
- What if you address the question of cover traffic, to make Zcash users indistinguishable from other Tor users or at least make them indistinguishable from other Zcash users?
- What if you grow the list of providers, using a similar policy to the policy Tor uses for guard nodes? Or what if you use Tor guard nodes themselves as per @teor2345's suggestion?
- What if you throw this role open to all-comers who pay some fee, as in the proposal here?
Most of these build upward, though maybe the final one is a step back. I think at #3 Zcash reaches the level of privacy of the standard centralized service model, which is something! And at #4 Zcash reaches the level of privacy of Signal, or a really good VPN with many users (even better!) There might be some step backwards as you increase the number of providers, especially if some providers have small numbers of users. And then at 7 or 8 you reach the level of protection offered by TorBrowser.
Is this helpful? It's helpful for me in thinking about it.
Also, I think it might be helpful to clarify that the goal of this is not to solve network level privacy in a general sense, but to address the problem of delivering transactions to the network such that they aren't linkable to each other, to a given IP in the act of sending, or to a given receiver (who might reveal their IP or other identifying information in other ways.) I'm not sure about this last part about linkability to receivers and feel like I'm stretching, but does it make sense?
to understand this better, I think we should build up from simpler changes and look at the privacy gain (and potentially loss) at each stage, to get a better idea of how this fits in.
Excellent approach. Before even that, let's review the current state of mobile wallet privacy. Here are my beliefs about the status quo:
- The wallets which support Zcash which have the most users do not support z-addrs at all. (Note: most zcash users might not use those wallets, though. I'm not certain of the situation currently.) In my opinion, even if they offered some kind of network privacy or purging logs or whatever, that's somewhat moot due to on-chain analysis. (This should partially clarify why ECC has other higher priorities: network privacy doesn't matter until we get most users using shielded wallets in the first place!)
- Shielded Full node wallets have excellent privacy with respect to receiving transactions, because all full nodes receive every transaction.
- Shielded Full node wallets which send transactions potentially reveal an association of their "p2p topology position" to the transaction that they send. If they do not use Tor or equivalent network privacy, this association may also include their IP address. (I'm hunting for better public research on this area of vulnerability…)
- Shielded Light wallets (which use @Electric-Coin-Company
lightwalletd
design) are especially vulnerable to surveillance of thelightwalletd
hosting provider as described in the threat model.
Specifically for 4, let's drill into the first two bullets for the Network- and Lightwalletd-Surveiling Adjacent-App Adversary:
There are several known weaknesses that this kind of adversary can exploit. The adversary can…
- tell that and when the user received a fully-shielded transaction. […]
- tell that and when the user sends a fully-shielded transaction. […]
Proposal Goal
This proposal intends to fix the second bullet here as well as point 3 above for full nodes.
Risk to Social Graph
Given the two bullets quoted here from the mobile wallet threat model, I would advocate against the approach of "put everything on one service provider that everyone believes is really trustworthy" model which @holmesworcester suggests as 1, 2, and 4 because these vulnerabilities allow a lightwalletd
hosting provider to track a comprehensive set of every shielded transaction sent and received by each specific client. By then correlating which sends/receives are to other users of the same system, a topology of which users interact with which other users (the "social graph") can be tracked, which is very harmful to privacy (even if amounts and memos are still protected).
Intuition about separating send/receive
My intuition is that a better first step from 1 or 2 is to separate the sending and receiving infrastructure. To send, a wallet can connect to anyone who can relay the transaction and doesn't need any apriori relationship or state. To receive efficiently by contrast, the client necessarily leaks state which can be used to profile the client. For this reason I think @holmesworcester 7-10 (the Tor stuff) is more likely to help more with sending than receiving. (The receiving problem maps pretty directly to "Private Information Retrieval" research, IIUC.)
Key Question: If we could break the association between which transactions a light wallet sends from which they receive, does this meaningfully disrupt the ability to track the "social graph"?
Send/Receive Split Thought Experiment
So here's a much simpler strawperson proposal focused on that question: There's a totally malicious lightwalletd host, called The Receive Service (aka RS) and everyone uses them. (Assume all wallets are light clients for simplicity.) Then, there's a separate totally malicious transaction relay provider called The Send Service (aka SS). However, the two services do not (by assumption, exploratory not realistic) collaborate in any way and share no information.
Now how well is the social graph protected? (There can be different sub-cases for active versus passive targeted or passive dragnet surveillance, too…)
Send/Receive Thought Experiment Scenarios
Scenario 1: Alice sends a transaction, T
, to Bob for the first time.
- Assertion 1.a.: SS knows
Alice sent T
but cannot tell who received it, or more precisely couldn't guess the receiver at any rate better than chance. - Assertion 1.b.: RS knows
Bob received T
but cannot tell who sent it (cannot guess sender better than chance).
Scenario 2: Alice sends T1 to Bob, then some time later Alice sends T2 to Bob.
- Assertion 2.a.: SS knows
Alice sent T1 & T2
but cannot tell who received either. Furthermore, SS cannot tell ifT1 & T2
went to the same recipient or different recipients. Concern: it seems reasonable to guess there's some likelihood thatT1
andT2
are to the same recipient because every participant only knows so many other participants and interactions with known counterparties are pretty frequent compared to new counter parties, so I'm not sure how to reason about this heuristic and what danger it poses to privacy. - Assertion 2.b.: RS knows
Bob received T1 & T2
but cannot tell who sent them, or if they had the same or different senders. Likewise with the concern above, there may be a similar heuristic guess that they were the same sender.
Scenario 3: Alice sends T1 to Bob, then some time later Bob sends T2 to Alice.
- Assertion 3.a.: SS knows
Alice sent T1 & Bob sent T2
but cannot tell recipients. Concern: what about using the heuristic that the frequency that someone sends a transaction to someone who previously sent them a transaction is high relative to the chance that they send to someone who has never sent to them? - Assertion 3.b.: RS knows
Alice received T2 & Bob received T1
but cannot tell senders. Again, there's a similar concern about the general heuristic pattern that people tend to interact with people they've interacted with before.
One thought about these heuristic guesses that people interact more with those they've already interacted with: the greater the transaction rates versus the time between the related transactions it would seem to be better at thwarting those guesses.
Next Steps
I'm not sure what the next steps are here. Is this strawperson idea helpful compared to the "single provider" model where a provider does both sending and receiving? Also, how does it compare to the "many providers" model where there a K providers who do both send/receive, but they do not collude?
A few thoughts on this:
Key Question: If we could break the association between which transactions a light wallet sends from which they receive, does this meaningfully disrupt the ability to track the "social graph"?
This is a separate question but it's really been eating at me. Letting a malicious lightwalletd "tell that and when the user received a fully-shielded transaction" does not seem like an appropriate concession to make in exchange for a mere 70% bandwidth reduction. Let's fetch all memos by default to remove that known vulnerability, until there's a better solution that doesn't sacrifice privacy! Once we do that, any data leakage about what transactions are interesting to a user becomes a bug, whether in the light or full node context. And then we can focus on the Send Service side of your example.
Then, there's a separate totally malicious transaction relay provider called The Send Service (aka SS)
For this part, if you are on a different Tor circuit every time you contact the Send Service, that's good enough, right?
In your proposal it's Tor—and creating unique Tor circuits—that provides the privacy and anonymity, right? If you connected to a relay in your relay network not via Tor, some number of those relays could be malicious and use your IP to attack you similar to a centralized Send Service (just with a less precision and omniscience perhaps?) @teor2345 makes the point that the TTRN practically guarantees that at least a few relays will be malicious. So it seems like Tor is the thing being relied on here for privacy and not the TTRN.
The more I think about this, the advantage of a TTRN over a single Send Service is structural legitimacy and not privacy. Tor is providing the privacy. And a Send Service with a fee would be scalable. But there's a legitimacy question of "who gets to run the one Send Service?" and arbitrarily appointing someone to do it seems illegitimate, and could lead to monopoly problems like high fees or bad service. But is it a bad answer for privacy, relative to the other options? If so I'm missing why.
Also, in exchange for the structural legitimacy of the TTRN, you have to worry about how to ensure everybody gets the same list of relays (so that they can't be fingerprinted) and that the relays are reliable enough for sending transactions, and maybe other complexities that aren't as obvious. It's not that this is bad, but if the TTRN doesn't provide additional privacy compared to a centralized Send Service and the reason is structural legitimacy, it's good to be clear that we're saddling a solution to a pressing privacy problem with other non-privacy requirements. And we might want to break up the two pieces into separate issues and attack them incrementally, in the interests of getting a solution to the privacy problem sooner.
One caveat here is that for the Send Service to be equivalent to the TTRN in terms of privacy it has to be the only Send Service. And that's maybe just not practically going to happen as a long term solution since there will be competition and disagreement about the best approach. So I can see that being an argument for a TTRN: it's a practical long-term solution to create a uniform method that everyone uses to send transactions. Unless there's some hole in my analysis up to this point, I think that's the best way to make the case for it: as a practical universal solution.
This also means that it's important to achieve strong privacy in the "single Sending Server" case as a necessary first step to making the TTRN approach work. Given the existence of Tor, it's going to be easier to figure out how to not leak data to a single Sending Server or a TTRN than it is to design the TTRN so that it can't infer data that would be leaked if there was a single Sending Server. Designing the TTRN that way requires reinventing Tor, right?
This is a separate question but it's really been eating at me. Letting a malicious lightwalletd "tell that and when the user received a fully-shielded transaction" does not seem like an appropriate concession to make in exchange for a mere 70% bandwidth reduction. Let's fetch all memos by default to remove that known vulnerability, until there's a better solution that doesn't sacrifice privacy! Once we do that, any data leakage about what transactions are interesting to a user becomes a bug, whether in the light or full node context. And then we can focus on the Send Service side of your example.
This hits the nail on the head. It's also readily solvable to privately grab the memo with no protocol changes to zcash. I did benchmarks using SealPIR and it was totally practical. The only thing that needs to be done is actually write a simple web-service that exposes something like HTTP get /transactions/{time_interval]/pir/[index] and the scripts to populate the database.
The more I think about this, the advantage of a TTRN over a single Send Service is structural legitimacy and not privacy. Tor is providing the privacy. And a Send Service with a fee would be scalable. But there's a legitimacy question of "who gets to run the one Send Service?" and arbitrarily appointing someone to do it seems illegitimate, and could lead to monopoly problems like high fees or bad service. But is it a bad answer for privacy, relative to the other options? If so I'm missing why.
Well said, this is my read as well. And I think 1) we need default Tor support sooner rather than later so it should be as simple as possible 2) its not a big legitimacy issue. We have non profits for doing this kind of stuff. And someone already has to run the DNS seeders.