bolts icon indicating copy to clipboard operation
bolts copied to clipboard

draft: HTLC Endorsement to Mitigate Channel Jamming

Open carlaKC opened this issue 1 year ago • 14 comments

This PR introduces an endorsed TLV to update_add_htlc as a way for nodes to indicate whether they expect a HTLC to resolve "honestly". Nodes are advised to allocate a limited portion of their outbound liquidity and slots to HTLCs that are not endorsed by peers that they consider to have high reputation.

Opening early for discussion on structure, not ready for review - discussions around recommendations for local reputation scoring are ongoing.

Slides for the visually-minded here

carlaKC avatar Apr 28 '23 18:04 carlaKC

Hi, I find this method of mitigating HTLC jamming quite interesting, however, I have one question. What will be the impact of using a reputation system on local channels on centralization of the network?

Previously a node could achieve a higher payment success rate if it had more channels to more nodes in the network and it would possibly achieve more privacy if it utilized different nodes to route its payments through. However, when this reputation system is implemented can it incentivize nodes in some cases to open and utilize fewer channels (in extreme cases only one) to gain more reputation over that one and therefore achieve a higher payment success rate? especially for nodes that don't forward many HTLCs.

bshramin avatar Jun 28 '23 05:06 bshramin

A lot of the discussion revolves around the specific reputation scheme proposed here, however I don't think that this should be part of bolts which only describe rules for communication between peers. While it is crucial to find a good way to compute reputation, this topic is already discussed elsewhere (mailing list, meetings), we should focus here on the actual spec change: a way to signal to the next node how confident we are that this HTLC will succeed. Different peers could even compute reputation differently as long as we agree that an endorsed value of 0 means that we have a low confidence that the HTLC will succeed and 1 means that we have a higher confidence it will succeed.

The questions that need answering here are:

  • Do we agree that it's a good idea to transmit some information about our own assessment of the HTLC to the next peer?
  • How much do we want to transmit? Just one bit as suggested here or more?

I personally think that it is useful to transmit our confidence to the next peer and that the more precision we give, the more useful it is. However too much precision could be a privacy leak (if you receive two HTLCs with the same confidence, it probably means that they followed the same path and came from the same sender) so I think that having 8 confidence buckets (3 bits of information) would be a good compromise.

thomash-acinq avatar Jun 28 '23 11:06 thomash-acinq

While I think that resource bucketing can make sense as an MVP for how to interpret the endorsement mechanic laid out in BOLT2, I find myself resistant to this being in the main BOLT sections. Even with the designation of "MAY", I think this is better suited to be an extension BOLT or perhaps even a BLIP.

Agreed.

It can make sense for me as a node operator to let a node with lower reputation offer an HTLC forward with a large fee, when I'd be hesitant to do so at a lower fee. Similar to the way that higher interest rates are charged for borrowers with lower credit scores, we need not deny a forwarding request simply because the upstream link doesn't have the reputation we'd want.

That's very dangerous as an attacker can trivially exploit this: they just need to offer very high fees to compensate for their bad reputation (it doesn't cost them anything because they don't intend to actually pay the fees, they will just fail the HTLC).

So to summarize my criticisms of the resource bucketing strategy, it comes down to two things: 1. It does not account for the continuously variable nature of the costs of offering the slots/sats,

That's only a limitation of this specific algorithm to assign reputation, which as you said should not be part of the spec. However even when using a continuous reputation scheme, the binary endorsement forces you to discretize to 0 or 1. That's why I'm suggesting to replace the binary endorsement with a confidence value on 3 bits. A fully continuous value could be a privacy leak but I think that 3 bits is a good balance between the 1 bit of this proposal and a fully continuous value.

thomash-acinq avatar Jul 03 '23 08:07 thomash-acinq

That's very dangerous as an attacker can trivially exploit this: they just need to offer very high fees to compensate for their bad reputation (it doesn't cost them anything because they don't intend to actually pay the fees, they will just fail the HTLC).

This is far from a trivial exploit. It is already the case that the attacker has no way to know what their reputation is with respect to their peers. For them to be able to exploit it, they would need to know what your threshold for endorsement is, which isn't a publicly knowable thing. Additionally, even while offering high fees for offered HTLCs does not guarantee the loss of those sats, it is still a capital outlay requirement that can reduce the reach of these attacks as well as well as reduces the attacker's bandwidth to accomplish them. That said, I'd imagine the reduction in effectiveness of the attack as a result of this increased cost is probably marginal at best, but this was also not suggested as a security scheme, I was simply pointing out that we cannot ignore the reward side of the incentive scheme when considering a node operator's interests.

That's only a limitation of this specific algorithm to assign reputation, which as you said should not be part of the spec. However even when using a continuous reputation scheme, the binary endorsement forces you to discretize to 0 or 1.

I actually think that this is a good thing. By forcing nodes to make a decision between 0 or 1 at the protocol level, you force the inputs to that decision to be a private matter, which ultimately it is. The node operator can either choose to tie its reputation to an HTLC or not.

That's why I'm suggesting to replace the binary endorsement with a confidence value on 3 bits. A fully continuous value could be a privacy leak but I think that 3 bits is a good balance between the 1 bit of this proposal and a fully continuous value.

I think that this convolutes things in a way that conceals the real dynamic in play. It is not the role of the endorser to "proxy" the reputation of its peers. The role of the endorser is to tie its own reputation to the HTLC it is offering. It is hard to understand how else to interpret the endorsement mechanic if it is allowed to have any more than 1 bit of signaling. Let's say we have 3 bits as you suggest, what happens if I endorse it to a level of 001 (000 being lowest and 111 being highest), and then the HTLC fails? What if the HTLC succeeds? What is my peer even trying to tell me when it gives a "partial endorsement"? The other issue with a continuous value is that it can basically be used as a measurement for how close to the payment source you are. Why would I endorse someone else's HTLC at a higher level than the upstream link did? Why wouldn't I ever endorse my own HTLC as 111?

Ultimately I believe the forced discretization of the endorsement is a good thing. In fact I believe that simply specifying that and having some discussion and recommendations around possible ways of interpreting endorsement (or non-endorsement), is enough for this proposal to be self-justifying and complete. I believe that the specifics of how to measure reputation and how to allocate HTLC slots/sats based off of reputation is beyond the scope of what this specification should offer.

Very often when we provide libraries we may also provide code examples to demonstrate how to use it, and I believe the resource bucketing scheme and ideas on how to measure and update reputation should not be viewed as anything more significant than a spec level code example. Compliance with these suggested schemes is neither enforceable nor can we expect nodes to adopt the same behaviors, so it really ought to be considered as a demo use of that endorsement bit.

ProofOfKeags avatar Jul 05 '23 17:07 ProofOfKeags

I should have thoroughly read the @ariard message but regarding the first sentence.

I think the major conceptual drawback of the proposal is the lack of a pure monetary strategy

This is a feature for now. IMHO a protocol proposal with a lot of maybe in different places need to start simple, then add things on top if required. But we need to ensure that the reputation also makes sense in the implementation other than in the paper. If I choose to implement this in cln is only because it is simple to start and flexible to extend with other penalties later.

vincenzopalazzo avatar Jul 18 '23 13:07 vincenzopalazzo

Going to move resource bucketing and reputation into a recommendations folder, as suggested by a few folks + discussed on IRC. Thanks all for taking a look - going to rework and bring out of draft (hopefully addressing some questions on the way with a more "proposal"-style format).

carlaKC avatar Jul 18 '23 15:07 carlaKC

This is a feature for now. IMHO a protocol proposal with a lot of maybe in different places need to start simple, then add things on top if required.

If you look on the history of modern cryptography and design of secure software systems, I think the opposite methodology is followed, where one picks up the worst realistic attacks vectors and then lay out a foundation in consequence. E.g ciphers and hash functions are not marginally reworked when you find a flaw, rather you upgrade to a completely different cryptosystem.

With this insight in mind, any jamming mitigation should be robust against advanced adversaries with asymmetries both in term of channel liquidity reserves and lively state of information about the Lightning Network than a target node. Here a “pure” monetary strategy allows a reduction 100% coverage of HTLC forwarding risks, constraining your adversary to pay the highest price to jam all your channels. Especially yielding the holistic optimized solution for network-wide attack: https://jamming-dev.github.io/book/2-costs.html. And those advanced adversaries are realistic with chainanalysis companies starting to investigate LN.

So saying it’s a feature would be akin to saying the difficulty adjustment algorithm on the base-layer is a feature because one naively assume miners have constant hashrate capabilities IMHO :)

I’m still working on the anti-DoS tokens solution (a.k.a staking credentials), though here realistically the cryptosystems for correct blinding will take time. Effectively, I think the reputation algorithms could be reused across solutions.

On the discussion raised by other reviewers, if the endorsement should be discrete or continuous. I think there is a trade-off between liquidity reliability (and all the payment traffic carried over it) and a disclosure of the state of congestion of downstream links. Here I think a correct approach could be to give more information to your upstream channels counterparties as you bootstrap better level of reputation with them, therefore enabling better link-level liquidity congestion management. I think the HTLC endorsement scheme would benefit to look on TCP congestion algorithms and its different phases.

ariard avatar Jul 26 '23 18:07 ariard

https://lists.linuxfoundation.org/pipermail/lightning-dev/2023-August/004034.html

One technical recommendation could be to add a BOLT9 feature bit option (e.g option_jamming_dry_run) for the routing hops participating in the data collection and have implementations adding a config flag for HTLC senders who wish to opt-in in the collection. In function of settings, hops could be selected or pruned out from the routing algorithms at payment path construction.

ariard avatar Aug 03 '23 19:08 ariard

One technical recommendation could be to add a BOLT9 feature bit option (e.g option_jamming_dry_run)

I also agree that a feature bit here will be useful but not for the dry_run but for the jamming mitigation itself. In this way, we can have a way to not connect with nodes that do not support option_jamming

This is not important node during the dry_run but will be important while we will find a solution and we merge it inside the protocol

vincenzopalazzo avatar Aug 29 '23 19:08 vincenzopalazzo

I also agree that a feature bit here will be useful but not for the dry_run but for the jamming mitigation itself. In this way, we can have a way to not connect with nodes that do not support option_jamming

I still think we should have a clear routing hops option_htlc_data_collection feature bit to enable end-users to opt-out their HTLCs traversing such routing hops wishing to preserve high-level of payer/payee. See concern not expressed by me here: https://lists.linuxfoundation.org/pipermail/lightning-dev/2023-August/004036.html

ariard avatar Sep 04 '23 21:09 ariard

I still think we should have a clear routing hops option_htlc_data_collection feature bit to enable end-users to opt-out their HTLCs traversing such routing hops wishing to preserve high-level of payer/payee. See concern not expressed by me here: https://lists.linuxfoundation.org/pipermail/lightning-dev/2023-August/004036.html

Why would anyone set a feature bit that would result in less relays? The protocol should always be aligned with financial incentives.

But regardless, this is irrelevant because there is nothing in this proposal about data collection. All implementations already collect some data as you can't run a lightning node efficiently in the dark, but as long the data is not shared, each node only knows two hops of the route. The email you're referring to actually suggest a solution so that this data can be used to evaluate this proposal without needing to be shared: each node runs the analysis on its own data and only a few aggregated metrics are shared.

thomash-acinq avatar Sep 05 '23 08:09 thomash-acinq

Why would anyone set a feature bit that would result in less relays? The protocol should always be aligned with financial incentives.

Latency as you don’t have to log the HTLC on-disk as a routing hop and actually attracting more traffic/off-chain fees if users are sensitive to their HTLC data being retained by the routing hops traversed. If you’re familiar with VPNs, that is usually something people look on. Of course, there is no way to strictly enforce it (though i’m not even sure as you might monitor latency timing ?) in a cryptographic or punishment-based way.

I won’t insist further as I’m not running a big lightning infrastructure and I don’t have to be concerned with my HTLC users data.

ariard avatar Sep 07 '23 01:09 ariard

Hodling for fun and profit

By design, reputation is costly to build but easily lost. In particular, HTLCs that take longer than 90s to resolve will > decrease reputations of any nodes that endorsed them. And for every additional 90s it takes to resolve the HTLC, > reputations are decreased further.

So any node on the network that receives regular HTLC traffic can hodl HTLCs to destroy reputations of the upstream > nodes.

Attack scenarios

Routing nodes, merchants, and LSPs on the network can exploit this weakness to destroy reputations of their competitors, > essentially for free. Once reputations have been sufficiently destroyed, the competitors' channels can then also be > jammed for ~0 cost.

A simple attack scenario could look like this:

EviLSP and HonestLSP are LSPs competing with each other. Both LSPs run lightning nodes that are well connected with > the rest of the network, and the LSPs also have a direct channel with each other. EviLSP starts to hodl all high-value HTLCs coming from HonestLSP. Just before the HTLCs approach their expiry, EviLSP > forwards them on to the next node. As HTLCs that have been in flight for hours start to settle, HonestLSP rapidly slashes the reputation scores of all its > upstream channel peers. EviLSP uses another lightning node to jam all of HonestLSP's channels. In followup PR, EviLSP claims their node had a temporary glitch causing delayed processing but that everything is fine now > and at least their service is working better than HonestLSP. HonestLSP's users start switching to EviLSP. Alternatively, EviLSP could be sneakier:

EviLSP occasionally hodls HTLCs forwarded to them from HonestLSP. The hodl frequency and duration is set high enough > to have a negative influence in the reputation algorithm but low enough to not raise HonestLSP's suspicion. After a few days or weeks, HonestLSP has slowly decreased the reputation scores of its upstream channel peers and no > longer allows those peers to access its privileged slots. EviLSP uses another lightning node to jam all of HonestLSP's channels. Mitigation

I haven't come up with any great ideas to mitigate this weakness. Hopefully we can get more people thinking about this > problem and potential solutions.

FYI - I believe this vector of attack for 3-party topology of lightning nodes (HonestLSP <-> EvilLSP <-> upstream peers) and onliness equivocation have already been mentioned in the context of channel jamming discussions.

See the email thread "Hold fee rates as DoS protection (channel spamming and jamming)" were long-delay applications such as atomic onchain / offchain swaps (e.g lightning loops) are mentioned, and how a time-independent hold feerate has been already suggested as a mitigation.

Reputation multiplier effect

Because in-flight risk is calculated separately for each pair of incoming and outgoing channels, an attacker can exploit network topology to cause more jamming damage than they paid for while gaining reputation. See inline comment for more details.

Mitigation

If in-flight risk is calculated per incoming channel only (ignoring the outgoing channel), or simply per upstream node (which makes sense when multiple channels exist between two nodes), then the multiplier effect disappears.

I believe the downside of aggregated reputation for a N number of incoming channels, wherever they're associated to a unique lightning node or not have already been considered in the past, with the controlled or uncontrolled scenarios.

See the digest post "Channel Jamming" documentation on bitcoin-problems made by one of the author of the "Unjamming Lightning" paper, from which I believe this draft is partially inspired.

I think even more sneakier hodling for fun and profit style of exploitation is leveraging multi-path payment and the fact that there are gossiped htlc_minimum_msat values associated to each LSP incoming routing channels.

PurpleTimez avatar Aug 18 '24 08:08 PurpleTimez