lnd
lnd copied to clipboard
contractcourt: implement smarter "go to chain" heuristics in the channelArbitrator
The channelArbitrator
, amongst may other things is responsible for deciding when we should go to chain in order to sweep outputs on-chain from expired contracts. We use the shouldGoOnChain
function currently to decide exactly when we should go on chain. Atm, the function is very basic, and will determine that we need to go on chain if the HTLC is about to expire. However, this doesn't capture the cost of going on chain. For example, if we have a 1000 satoshi HTLC that's about to expire, and it would take 20k satoshis to actually fully sweep the output, then the cost simply isn't worth it. The shouldGoOnChain
function should be modified to capture such opportunity cost heuristics.
Steps To Completion
-
[ ] Extend the
shouldGoOnChain
method to take into account (all or some of): the on chain fees to sweep an HTLC, the duration that an HTLC has been active, the fee revenue related to the channel we're examining (as if the channel is popular we shouldn't close it to sweep a single HTLC), the CSV value. -
[ ] A series of tests should be written to ensure that each of the heuristics are properly implemented and are adhered to by the
channelArbitrator
.
@Roasbeef I'd like to try this.
the on chain fees to sweep an HTLC, the duration that an HTLC has been active
I think this is not so difficult if it is implemented like estimatesmartfee
in bitcoind.
It's the first task.
the fee revenue related to the channel we're examining (as if the channel is popular we shouldn't close it to sweep a single HTLC), the CSV value
It is difficult to define what is CSV for LN.
So in some cases (say high fees and a small htlc), we'd just not go on chain and take the hit of losing the htlc amount rather than losing more by going on chain non-economically?
For example, if we have a 1000 satoshi HTLC that's about to expire, and it would take 20k satoshis to actually fully sweep the output, then the cost simply isn't worth it.
If so, I think that we'd need some way to record that we deemed a htlc non-economical to sweep? Otherwise if they're swept by our peer we've got less funds than we'd expect and have to start following outputs on chain to figure out what happened.
Being the author of #6749 which was closed as duplicate of this one, I'm surprised to notice that the issue has been open for quite a while, and without much activity in general. I would think this issue would involve many operators, and remove the possibility of wasting funds which is always good. Does it pose any particular difficulty to implement it? Like:
if ($expected_chain_fee>$expected_htlc_loss*N) {
skip_force_closure();
log_skip();
}
which would pretty much work if LND was written in PHP.
If a peer lets an htlc time out, this could be an indication that the peer is not reliable. If you decide not to force close because it is not worth it, maybe you want to avoid offering new htlcs until you're confident that the peer is reliable (again)?
I am not sure I follow, was your intention to evaluate the idea to reject forwarding to the peer, even after they get back online, at least for a time X until it can be inferred they fixed their issues? If yes, the chan-disable-timeout
and the other settings in same chapter of lnd.conf
would be the ones to be used, no?
If instead you rather meant something else, would you please let me know? I am choosing not to read your reply as an attempt to undermine my suggestion 😄
A simpler idea from (https://github.com/lightningnetwork/lnd/issues/6933#issue-1380568348): only ignore this if we know it's our HTLC.
Two things to consider here:
- If we fail HTLCs upstream without closing the downstream channel, we create an attack vector where the downstream node can steal our channel balance a few dust HTLCs at a time.
- If we don't fail HTLCs upstream, we will get cascading force closes.
I propose that we do force close the downstream channel when HTLCs expire, regardless of whether it is economical to recover the HTLCs on chain. This shuts the door on any dust theft vectors. Even if the downstream node is honest and was just offline, we probably don't want to continue a channel relationship with such an unreliable peer.
Then as the upstream deadline approaches, we can fail the HTLC off chain to prevent cascading force closes.