Route Blinding (Feature 24/25)
Route blinding allows a recipient to provide a blinded route to potential payers. Each node_id in the route is tweaked, and dummy hops may be included.
This is an alternative to rendezvous to preserve recipient anonymity. It has a different set of trade-offs: onions are re-usable, but the privacy guarantees are a bit weaker and require more work (e.g. when handling errors).
The main areas I'd like reviewers to focus on at first are:
- the crypto itself, and whether we're missing an abstraction (it's really a kind of reverse sphinx) or a simpler way to do this
- whether we can live with some
update_add_htlcthat have a different length than otherupdate_add_htlc(which kind of gives away the blinded path if you're watching the network traffic) - error management: how do we ensure errors don't leak the blinded path? Obfuscation + delayed responses may do the trick, but is it enough (and is it necessary)?
The attack scenarios that reviewers should focus on include (but are not limited to):
- can attackers figure out the identity of the recipient, and if so, with what kind of attack and resources?
- note that attackers can do any kind of message tampering, delaying or dropping if they're intermediate nodes
@t-bast on E2E issues with a sender requiring multiples invoices you may bound through some utxo-pinning stuff, like if we adopt Poodle (or any confidential-utxo declaration scheme) for dual-funding we may reused it for this. "To require a new invoice, announce me a another channel utxo or you don't have more than X invoices for one channel utxo registered". Maybe not for a v1 but something to explore ?
"To require a new invoice, announce me a another channel utxo or you don't have more than X invoices for one channel utxo registered"
That's an interesting idea, it would force payers to have some skin in the game (multiple channels). It's worth exploring!
@ariard @cdecker thanks for the early reviews, I marked many comments as resolved to make it more readable for newcomers, feel free to unresolve if you think it's useful.
@rustyrussell I have updated the PR. It now contains formal spec requirements on how to use route blinding. It's still a bit rough around the edges, it's a complex mechanism so may be hard to read, hopefully reviewers will help make it better.
I didn't specify how to handle errors though, as it's really just a low-level tool as currently specified: upper layers depending on route blinding should have their own requirement on how to handle failures (usually due to an adversary probing).
I also added two (big) test vectors that exercise a lot of the internal code, can you verify them? We should squash incompatibilities early before we start using route blinding in onion messages (and have another round of potential compatibility issues!).
You should also now be able to rebase onion messages on top of this branch, which should make the review simpler (cc @TheBlueMatt @thomash-acinq).
I also have a concise eclair implementation in https://github.com/ACINQ/eclair/pull/1962 with tests that showcase how route blinding could be used in payments, with more comments than I could put in the spec. Readers may find it helpful!
@rustyrussell I updated the test vectors after addressing your comments. Without the ephemeral blinding key in the AAD, we're likely to be compatible now!
OK, so now I'm looking at the test vectors (finally!).
They're missing the secret keys for the nodes: can we stick with Alice, (AAAA ie. 0x414141...), Bob, etc? Without this you can't regenerate the test vectors, or actually test that the ephemeral keys are correct?
Note that in general, onion test vectors can't be reproduced because the onion padding is random. We could override that (and use all-zeroes as "random"), of course.
And route-blinding-override-test should be merged into route-blinding-test IMHO; it's not like you can avoid implementing that part!
Ideally, the route blinding tests would be reused in the onion test, so an implementer can get the encoded tlvs right, then move on to apply it to the whole onion.
They're missing the secret keys for the nodes: can we stick with Alice, (AAAA ie. 0x414141...), Bob, etc?
The secret keys are here, in the unblinding object in the node_privkey fields (and I did indeed keep the Alice, Bob, etc).
You only need them when unblinding the route, not when creating it, right?
When you create the blinded route, you only need the node IDs and the first hop's ephemeral_privkey, which should be generated randomly by the sender (we use 0101010101010101010101010101010101010101010101010101010101010101). Do you want me to make this more explicit by lifting this value outside of the hops array, at the top-level of the blinding object and calling it session_key?
Note that in general, onion test vectors can't be reproduced because the onion padding is random. We could override that (and use all-zeroes as "random"), of course.
IIRC we all implemented the same deterministic version described in https://github.com/lightning/bolts/pull/697 so it should be reproducible, shouldn't it? But it's true that people using it differently wouldn't be able to reproduce this...
And route-blinding-override-test should be merged into route-blinding-test IMHO; it's not like you can avoid implementing that part!
Sure, I'll merge them.
Ideally, the route blinding tests would be reused in the onion test, so an implementer can get the encoded tlvs right, then move on to apply it to the whole onion.
The reason I made the onion one separate is that it makes more sense if it shows how the sender adds normal hops before the blinded route, and I wanted to keep our existing 5-hops test nodes, which restrains the route blinding part to only 3 hops, which would make the blinding override test vector a bit lame...
They're missing the secret keys for the nodes: can we stick with Alice, (AAAA ie. 0x414141...), Bob, etc?
The secret keys are here, in the
unblindingobject in thenode_privkeyfields (and I did indeed keep the Alice, Bob, etc). You only need them when unblinding the route, not when creating it, right?
Oh, I missed the decrypt section! Sorry!
When you create the blinded route, you only need the node IDs and the first hop's
ephemeral_privkey, which should be generated randomly by the sender (we use0101010101010101010101010101010101010101010101010101010101010101). Do you want me to make this more explicit by lifting this value outside of thehopsarray, at the top-level of theblindingobject and calling itsession_key?Note that in general, onion test vectors can't be reproduced because the onion padding is random. We could override that (and use all-zeroes as "random"), of course.
IIRC we all implemented the same deterministic version described in #697 so it should be reproducible, shouldn't it? But it's true that people using it differently wouldn't be able to reproduce this...
Pretty sure we chose that approach, so I guess I'll try to repro the vectors and see!
And route-blinding-override-test should be merged into route-blinding-test IMHO; it's not like you can avoid implementing that part!
Sure, I'll merge them.
Ideally, the route blinding tests would be reused in the onion test, so an implementer can get the encoded tlvs right, then move on to apply it to the whole onion.
The reason I made the onion one separate is that it makes more sense if it shows how the sender adds normal hops before the blinded route, and I wanted to keep our existing 5-hops test nodes, which restrains the route blinding part to only 3 hops, which would make the blinding override test vector a bit lame...
Erk, yeah. For onion messages there are no "normal" hops.
I think it can still have one of each: Alice gets a normal hop, Bob gets a blinded path, Carol gets a blinding override and Dave is terminal? Or am I miscounting?
Ack bolt04/route-blinding-test.json BTW (a9e73be6b8bd86aa938318e3a7fc3370c5be675b)
route-blinding-override-test is annoying to work with. I really prefer to combine everything (set and derived items) into one object, resulting in one single array, rather than this separation where I have to examine multiple different objects at once (even though a real implementation would do that).
Of what I did test, ack bolt04/route-blinding-override-test.json BTW (a9e73be)
I really prefer to combine everything (set and derived items) into one object, resulting in one single array, rather than this separation where I have to examine multiple different objects at once (even though a real implementation would do that).
I thought that separating them made sense, because then you're able to test sub-components independently (creating the route and unwrapping it is usually done by distinct nodes, that don't have access to the same data, so this ensures each component only has access to the minimal amount of data it needs). But maybe that was over-engineering it...shall we wait for a 3rd implementation to chime in to help us converge on a final format for these test vectors?
Ack bolt04/route-blinding-test.json BTW (a9e73be) ack bolt04/route-blinding-override-test.json BTW (a9e73be)
Awesome! So we should easily get interoperability for onion messages now ;)
I think it can still have one of each: Alice gets a normal hop, Bob gets a blinded path, Carol gets a blinding override and Dave is terminal? Or am I miscounting?
That is true, I can re-work them that way and merge them into a single (bigger) test vector. Do you want me to do that now, or should we wait for another pair of eyes to discuss the format before?
@rustyrussell I updated the test vectors and squashed. There is now a single route blinding test vector (that tests blinding override) and a payment onion test vector.
I think I've made the route blinding test vector simpler to work with (the hard part is how to convey to implementers how they should build that test route, since it's the concatenation of two routes that are built by unrelated nodes). Hopefully the comments make it clear enough, let me know!
I've kept the generation of the route and its processing (unblinding at each hop) separate, as these are really two different things to test and I think it's clearer for implementers to see that nodes don't have access to the same data depending on what operation they're doing.
The payment onion test vector then directly re-uses this blinded route, and I believe your onion messages test vector could do the same?
OK, now I'm finally updating our long-experimental HTLCs-with-blinding code.
I note that you put the blinding factor in the onion? I initially had it passed in a tlv with update_add_htlc. That saves onion space (33 bytes per hop) and mirrors how onion messages work.
It also means I only need to provide you with enctlvs and node_ids, and the initial blinding factor E(0). Otherwise I need to give you all the E(n) blinding factors, which I think means you could choose to only use PART OF the path (thus de-anonymizing the first part of the path)?
Yes, I think it's better (at least in the beginning) to use the onion to send the blinding factor to the introduction node. This way, nodes before the introduction node (in the non-blinded part of the route) don't need to support route blinding.
If you are a wallet behind an LSP that supports route blinding, and senders support it, you'll be able to use it right away.
Whereas if we use an update_add_htlc TLV in the non-blinded part of the route, it means the sender needs to find a route with only nodes that support route blinding, and if they can't find one, they cannot use your blinded route, which is a pity because there's really no reason to require the non-blinded part of the route to understand the feature.
Another reason to put it in the onion is that nodes in the non-blinded part of the route don't even know that route blinding is being used, which I believe is a good thing - the less they know, the better.
Are those good enough arguments?
It also means I only need to provide you with enctlvs and node_ids, and the initial blinding factor E(0). Otherwise I need to give you all the E(n) blinding factors, which I think means you could choose to only use PART OF the path (thus de-anonymizing the first part of the path)?
I don't understand that part. Whether we send the initial blinding factor to the introduction point via the onion or a TLV in update_add_htlc doesn't seem to change anything here?
@t-bast asked:
- whether we can live with some
update_add_htlcthat have a different length than otherupdate_add_htlc(which kind of gives away the blinded path if you're watching the network traffic)
I am a bit confused. How that can happen when using blinded paths? As of BOLT 2 the update_add_htlc includes an onion_routing_packet of fixed size of 1366Bytes. According to this proposal the blinded path would be added as an encrypted_data to the tlv_stream of the tlv_payload for a hop inside the onion.
In case I should miss something wouldn't the following recommendation help:
Sending nodes SHOULD add an empty blinded path to the onion_routing_packet even if they didn't want to use blinding?
On a more general note: I thought the transport layer to send messages is encrypted anyway not allowing an attacker to see which messages are being sent. So if messages varied in length one would not know directly that it would be an update_add_htlc message which seems to me like an improvement. (Thought one might conclude this from the context of the surrounding message with fixed length (revoke_and_ack, commitment_signed, update_fulfill_htlc,...) that an update_add_htlc was being sent).
I am a bit confused. How that can happen when using blinded paths?
That's because inside the blinded part of the route, when nodes forward HTLCs, they have to include the next ephemeral blinding point in the tlv stream of their outgoing update_add_htlc, which adds bytes to the message compared to a normal update_add_htlc that doesn't contain a tlv stream.
It forwards the onion to the next node and includes E(1) in a TLV field in the message extension (at the end of the update_add_htlc message).
On a more general note: I thought the transport layer to send messages is encrypted anyway not allowing an attacker to see which messages are being sent. So if messages varied in length one would not know directly that it would be an update_add_htlc message which seems to me like an improvement.
This is a valid point, to be honest I really don't know whether we must try to keep update_add_htlc fixed length or not, maybe making it variable-length thanks to additional tlvs in the extension is actually a good thing as you suggest. That deserves more analysis. If we want to keep it fixed-size, then it's easy, we can mandate that all nodes send a dummy ephemeral blinding point (all zeroes) when route blinding is unused.
Thought one might conclude this from the context of the surrounding message with fixed length (revoke_and_ack, commitment_signed, update_fulfill_htlc,...) that an update_add_htlc was being sent
commitment_signed isn't fixed size, since it depends on the number of pending htlcs in the channel, but it's true that the others are. Adding random padding data in their tlv extension could make sense to ensure they're never fixed-size if we want to thwart this kind of analysis. That's an interesting area to explore.
What should I do with the remainder comments of the other PR? just delete them or can you mark them resolved?
I think it would be nice if you could delete them, to avoid cluttering the onion messages PR :+1:
Thanks for the review!
I think it would be nice if you could delete them, to avoid cluttering the onion messages PR +1
I hided them for now as I might move some of the more stylistic comments here later (sorry did't manage do do that yet)
Thanks for the review!
Thanks for the proposal and your replies. Will address the other topics later.
I have done quite a big update of this PR (thanks to a lot of ideas and feedback from @thomash-acinq) and re-worked the commit history.
The first commit only introduces the proposals document, which I highly recommend reviewers to read: more natural language and detailed examples will help you grasp the ideas and subtleties of the scheme before reviewing the bolt requirements.
The second commit adds formal requirements on how to build a blinded route, without tying this to a particular scenario: it contains the cryptographic details of how this thing work.
The third commit adds formal requirements detailing how payments should work with route blinding. Note that the test vectors in that commit are likely outdated and will be updated once we have more feedback on the proposal. Once we've finished bikeshedding the mechanism and tlv fields, I will also need to specify how they are encoded in Bolt 11 invoices (and in the future, Bolt 12 invoices).
Please make sure you thoroughly understand the probing attacks described here and help us figure out if the current mechanism is sufficient to protect against such attacks or if other probing vectors exist that we have missed.
@rustyrussell @Roasbeef @TheBlueMatt
A stupid question:
If a node has 400 channels but has to pick just a few possible routes to blind (maybe blinding even 2 or 3 hops) doesn't that negate all the benefits of being well-connected? The same applies to mobile or private nodes that are connected to just one peer which has has 400 channels (the final receiver will have to pick a few of these 400).
It feels to me that this will be a very hard opt-in privacy move that most people will default to not use because they won't sacrifice payment success rates, then using blinded paths will become the mark of the criminal or wrongdoer like coinjoins are viewed today by some. What am I missing?
One immediate use case that isn't affected is the private node with just unannounced channels that blinds just the final hop -- but for these cases the "alias scid" proposal is probably already good enough.
That is a good question, selecting the blinded paths you'll give to payers does have some subtlety and will require different algorithms depending on your connectivity and the size of the expected payment.
If you're a mobile wallet then it's easy, you won't have a lot of different peers, so you can easily include a good enough number of them. If the amount you're trying to receive shouldn't need splitting, you can provide 2-hops or 3-hops blinded routes without impacting reliability. If the amount you're trying to receive is somewhat big though, you may need to instead provide 1-hop blinded routes with dummy hops added at the end: senders won't be able to know that you're using dummy hops so you still get privacy, but you're actually only using 1 hop so you'll be able to leverage most of your incoming liquidity.
If you are a public node with a lot of channels to different peers, you will most of the time want to use 1-hop blinded routes with dummy hops, unless the amount is small and should be easy to route towards you. Because you're a well-connected node, the dummy hops will look very credible. You will have to choose a subset of your 400 peers obviously, so if you're trying to receive amounts that require a split among many incoming routes, this may be problematic: but if you expect to receive very big payments regularly, having your incoming liquidity scattered between too many different peers is not a very good idea anyway in my opinion.
So the answer really depends on how big the payments you expect to receive are compared to subsets of your peers. I believe that this is something that should be taken into account when positioning yourself in the graph and opening channels, so it's not in my opinion a show-stopped for blinded paths at all, but I'm curious to get your thoughts on this.
My current thought is that the added complexity may not be a showstopper, but no matter what you do as a receiver, there is no way you won't lose some of your receivability since there will necessarily be some restriction of the possible routes the payment can take to arrive at you.
If that translates into a big enough decrease in payment success rates then people will start opting out of blinded paths and only super privacy-minded people will keep using them -- which is fine if we can have everything, but maybe it's not worth the trouble given the alternatives.
Among the alternatives, it feels like trampoline payments with trampoline hints on invoices + dummy trampoline hops + alias scid for unannounced channels have a better set of tradeoffs and should be prioritized.
I really don't see how - as the recipient you have strictly more knowledge than the sender and can select the same set of channels as the sender would pick anyway, but even better because you can select the channels that actually have enough capacity to receive a given payment now. This is a strict improvement in receiveability over existing lightning.
I agree with Matt, this makes reliability strictly better because the recipient adds local knowledge by choosing the blinded paths, helping the sender in choosing paths that will have liquidity on the recipient side.
I think the case you have in mind @fiatjaf is having 200 peers with for example only 1k sats of incoming liquidity with each of them, and trying to receive for example 100k sats. To receive such a payment you'd need the sender to split among 100 paths, and you don't have enough space to provide 100 blinded paths to the sender. But even if you publicly share your node_id, the probability that the sender is able to send you that payment is extremely low: if your node is in that degenerate case, you won't be able to reliably receive payments at all.
That's not what I have in mind. Let me try to explain better with an image (keep in mind I am not a computer scientist and I don't know graph theory or whatever these things are called, so please be patient):

In this example target has handpicked the two channels he knew that had incoming liquidity, then a random second hop for each (as they can't know channel balances of these second hops).
The only route that would succeed in this case is the green one -- because all the other possible routes are offline or have no liquidity (and target doesn't know, but source would find out while attempting to pay).
Therefore, by choosing to provide blinded routes target has limited the possibilities of being paid and (in this example) the payment didn't succeed.
The payment could have succeeded if, by luck, target had picked the second hop with the green line on it.
In this example target has handpicked the two channels he knew that had incoming liquidity, then a random second hop for each (as they can't know channel balances of these second hops).
You really don't need to pick two real hops in your blinded path when you think that the payment will be "hard" to receive, because you can add dummy blinded hops and no-one can know that they're dummy hops. What the target would do in your case is that the blinded paths that it will provide will look like:
Fake <--- Target <--- Peer1
Fake <--- Target <--- Peer2
If the target does that, the green route can be used and the payment will work. Does that make sense?
Yes.
OK, I made a new diagram with another situation. This time the receiver is only using one blinded hop. If he had used two it would be even worse, of course.

The point here is that by using blinded paths the receiver forces the sender to use a different route than the best routes it would be able to find otherwise, and that introduces more chances of the payment failing -- by introducing more hops that could be offline or not have liquidity (not to mention more fees or more delay).
It may look like I'm cheating by just coming up with bad examples, but I'm trying to show that these bad examples are possible, and I believe they will be norm (even though sometimes they won't be super bad, but they will be at least a little bad, which was my point all along).
Yep, if you only give a few options and they're really far away from your other options, you're right, it could be worse. In practice, the diameter of the lightning network today is really really low. The blinded path generator can select the hops that are as far from each other as possible (ie do BFS until you get to a common point and select the channels with the furthest loop), which would guarantee you won't hit this case, modulo an extra hop or two, but that is more than counteracted by the recipient using their own channel amount-available data and knowledge of their local graph from recent payments they made to score the channels by success probability. In that case, you no longer are making the sender take an extra unreliable hop, you're making them take an extra hop but guaranteeing it has enough liquidity, so it'll work.