lnd
lnd copied to clipboard
htlcswitch: final settle signal
This PR adds a new final settle state for incoming htlcs. An htlc reaches this state after the update_settle_htlc
update has been fully locked in.
A new rpc IsHtlcSettled
is added that can be used to check whether an htlc reached the new state.
Additionally an event is added to the existing SubscribeHtlcEvents
stream. This allows users to wait for final settlement without polling IsHtlcSettled
. It is important to have a non-polling mechanism because waiting for final settlement is likely to be part of the critical path of a lightning payment.
More background info and design discussion: https://groups.google.com/a/lightning.engineering/g/lnd/c/Sfy_M9avQoU
Fixes https://github.com/lightningnetwork/lnd/issues/6208
TODO:
- [x] Add failed resolution
- [x] Tests
- [x] Notification from onchain fail/settle flow
- [ ] Remove debug
htlcsettled
lncli
command
Looking for concept+design ACK
Concept ACK
Is there perhaps any unconsidered complexity for situations where multiple HTLCs exist in the commitment transaction for a single channel? Or would the invoice status not depend at all on any other HTLCs in the channel?
Is there perhaps any unconsidered complexity for situations where multiple HTLCs exist in the commitment transaction for a single channel?
There should not be such complexity. The code as is should record all final settlements of htlcs, regardless of whether they are grouped together on the same commitment.
Or would the invoice status not depend at all on any other HTLCs in the channel?
For a multi-part payment, the invoice status could depend on multiple htlcs in the same channel. But the scope of this PR does not include invoices at all.
To determine whether a received payment is final, one should retrieve all the settled htlcs from the invoice database and then look them up one by one via for example the api that is added in this PR to see if the settlements are final.
Obviously this can be streamlined later on for invoices specifically.
Waiting for design ack for the direction set out in this PR. In particular storing the settled htlc ids in a new bucket settled-htlcs
and doing so atomically in the channel state update transaction.
I think this patch shouldn't use the shortchanid, and should instead use the funding outpoint - we shouldn't add more code that needs reorg handling. I also think there should be benchmarks for lookup latency with N settles over M channels.
In the revocation PR, you brought up the idea of looking at the resolvers instead of doing this. Why would you need this if you weren't the exit hop? Maybe this is because I don't understand the htlc interceptor settle flow where you intercept and settle back, already knowing the preimage. Another case is maybe you want to see if you've lost money as an intermediate node due to not being able to claim on the incoming?
Another thought is: could this use some sort of hash accumulator? It seems that you're only performing lookups on the set... Not sure of a golang library that can help here.
I think this patch shouldn't use the shortchanid, and should instead use the funding outpoint - we shouldn't add more code that needs reorg handling.
Are reorgs even supported generally for channels that are fully confirmed?
But I get the point. I use short channel id, because we also use that in the htlc interception api. These two apis are related and can be used together. Short chan id may also have not been the best decision for htlc interception?
I could convert the PR to outpoints and then leave it up to the user to do the mapping from short chan ids if necessary?
I also think there should be benchmarks for lookup latency with N settles over M channels.
What scenario are you thinking of specifically when performance would be critical? LND internally doesn't do any lookups, it would just be load generated by a connected application.
In the revocation PR, you brought up the idea of looking at the resolvers instead of doing this. Why would you need this if you weren't the exit hop? Maybe this is because I don't understand the htlc interceptor settle flow where you intercept and settle back, already knowing the preimage. Another case is maybe you want to see if you've lost money as an intermediate node due to not being able to claim on the incoming?
Linking the resolver idea here: https://github.com/lightningnetwork/lnd/pull/6347#discussion_r868016350
I think just looking at the resolvers is not sufficient, because you don't get a signal when an htlc setlles off-chain in the happy flow. Without that signal, an application still doesn't know when the payment truly cleared.
One scenario where you need this as a routing node is if you use the htlc interceptor to create virtual channels. We use it with lnmux. I believe LSPs for mobile users have similar requirements for just-in-time opened channels. And indeed, more generally for exact accounting on a routing node you probably want to know if you were able to claim all your incoming forwards. For that though, the resolver information might be sufficient as it is not time critical. Maybe there is an edge case if the counterparty doesn't go to chain (yet) to timeout the htlc.
Another thought is: could this use some sort of hash accumulator? It seems that you're only performing lookups on the set... Not sure of a golang library that can help here.
Not familiar with hash accumulators, but isn't that a probabilistic structure that can lead to false positives? False positives wouldn't be good for this application.
Are reorgs even supported generally for channels that are fully confirmed?
No, my point is that (I think) eventually we'll be moving away from SCIDs to identify a channel, so it may not make sense to make another place where that's done.
Short chan id may also have not been the best decision for htlc interception?
I don't think you have a choice, the switch is pretty tied to SCIDs at the moment.
What scenario are you thinking of specifically when performance would be critical? LND internally doesn't do any lookups, it would just be load generated by a connected application.
What about the rpcserver call to check if something is settled? Additionally, some nodes will be inserting frequently into this global settled bucket so that's where the concern comes from. I'm basically wondering if a large bucket here makes insertion/lookups very slow due to bbolt traversing the b+ tree and having to cache a lot of leaves.
Not familiar with hash accumulators, but isn't that a probabilistic structure that can lead to false positives? False positives wouldn't be good for this application.
I was thinking of something like utreexo that had proof inclusion, but after glancing at the paper, that doesn't actually help with storage here.
No, my point is that (I think) eventually we'll be moving away from SCIDs to identify a channel, so it may not make sense to make another place where that's done.
Can you get confirmation on that before I convert this PR?
What scenario are you thinking of specifically when performance would be critical? LND internally doesn't do any lookups, it would just be load generated by a connected application.
What about the rpcserver call to check if something is settled? Additionally, some nodes will be inserting frequently into this global settled bucket so that's where the concern comes from. I'm basically wondering if a large bucket here makes insertion/lookups very slow due to bbolt traversing the b+ tree and having to cache a lot of leaves.
My thinking here is that that single db lookup should be negligible compared to all the updates that are required to just lock in and settle an htlc on the channel. It can also be a batch lookup for more efficiency. The insert itself is part of an existing transaction, so at least that wouldn't add another write lock.
I can do some benchmarking once the shape of this PR is more final. Linking related PR https://github.com/lightninglabs/neutrino/pull/213, which @Roasbeef brought up in the review club.
Can you get confirmation on that before I convert this PR?
Yes, I confirmed it
The insert itself is part of an existing transaction, so at least that wouldn't add another write lock.
I am concerned about the insert here
@Crypt-iQ if you talk about the funding outpoint, do you mean txid:index, or the "compressed" 32-byte format that xors the index into the txid? I don't think we use the 32-byte format on the rpc interface anywhere yet.
@Crypt-iQ if you talk about the funding outpoint, do you mean txid:index, or the "compressed" 32-byte format that xors the index into the txid? I don't think we use the 32-byte format on the rpc interface anywhere yet.
the outpoint, not the channel id (which is the 32-byte representation)
Another thing to be aware of is that we need to keep the mapping around for closed channels too. For example when you LookupInvoice
and want to get the final status of the htlc, you still need to map the short chan ids in that output to outpoints and the channels may be closed by that time. Do you foresee any problems with this or is it just a matter of downloading the mapping from ClosedChannels
, or looking up on-chain based on short chan id? It's not the most convenient for the caller though.
Did a simple benchmark to check the performance implications of saving the final htlc outcome to disk.
- Alice and Bob, single channel
- Call
lncli sendpayment
20 times to try to pay to Bob with an unknown hash - Bob fails the htlc and writes the final outcome in the new bucket
Master branch: 34.412 sec This branch: 33.861 sec
Not a statistically significant difference.
Of course testing options are endless here. But not sure if it is needed. The buckets are already sharded by channel, so the number of entries will be similar to the revocation bucket with a much smaller size. Can't we conclude from that that any performance impact is negligible?
In the context of this PR, shall I add ListChannelHtlcs
and DeleteChannelHtlcs
to list and delete all records for a given channel? Open to suggestion for better ways/calls to manage disk space.
Alternatively those can be implemented in a follow-up PR.
Added unit and itest coverage.
@crypt-iq: review reminder @joostjager, remember to re-request review from reviewers when ready
@crypt-iq: review reminder @joostjager, remember to re-request review from reviewers when ready
Rebased
I realize that I am missing the on-chain resolution of dust htlcs. They don't go through a resolver. Todo.
Extended channel arbitrator to store final failure for incoming dust htlcs.
Initially the issue was related to the invoice settlement flow, that we marked the invoice as settled without waiting for the commitment dance to be finished. And I think the fix there is already mentioned by @joostjager, that we either add a new state to the invoice to mark the fact it’s waiting to be settled. Imo that’s the proper fix, we don’t want to report a settle state while in fact it’s not.
Thus I think saving all settled htlcs is not the fix to that issue, but rather a workaround. If we really want to have this feature, I think it’s better to have it opt-in as it does use more disk space and may not be a universally needed feature.
This has been discussed previously on the mailing list: https://groups.google.com/a/lightning.engineering/g/lnd/c/Sfy_M9avQoU
Just fixing invoices doesn't help with intercepted htlcs.
I can make the feature opt-in for sure if that's required.
Speaking of disk space, I think the info saved here can already be found in the revocation log if we modify the revocation log to also mark what happens to the htlcs saved. This means while the channel stays open, we can query the log to see whether a given htlc has been settled. And once it’s closed, I guess the need for such a query is not strong? Even then, we can prune the revocation logs to save less, or dump the related data to the new bucket created in this PR. This should create a more efficient way to store the data.
I don't think it is possible, see https://github.com/lightningnetwork/lnd/issues/6208#issuecomment-1026835144
And we still need the function to delete the data on demand to prevent the db growing forever.
We said to do it in a follow-up, see https://github.com/lightningnetwork/lnd/pull/6517#pullrequestreview-1099703975
And making this opt-in can complicate things. So I guess I'm not sure, tho leaning towards making it opt-in, at lease give users an option.
Given the advanced state of this PR, would you be okay with discussing and implementing the opt-in in another PR, possibly combined with delete functionality?
Removed dev cli command. Saved here: https://github.com/bottlepay/lnd/commit/529241691c81ddde42e66a0c1b99b25607e50a19
I'd like to clarify what to expect in the API of lnd 0.16 regarding this issue. I saw @joostjager 's tweet
To address this, lnd 0.16 will contain a new api that provides insight into the final resolution of htlcs. It's another step in the maturation process of lightning. Use it to your advantage.
I'm currently using Alex Bosworth's lightning library, checking for settlement on an invoice subscription. Is there or will there be an event in the invoices subscription stream from lnd for final settlement in v0.16?
Is there or will there be an event in the invoices subscription stream from lnd for final settlement in v0.16?
Currently there is not. Added steps for how to do it at the moment to the pr description.
@joostjager I seem to end up in this piece of code here for every new block for a channel that's still open. The htlc is apparently not handled.
This PR appears to fix the problem by finalizing sub dust incoming htlcs. So I'm considering applying this PR. But the following comment suggests that the channel should already be closed: https://github.com/lightningnetwork/lnd/pull/6517/files#diff-a0b8064876b1b1d6085fa7ffdbfd38c81cb06c1ca3f34a08dbaacba203cda3ebR2227-R2228
Do you have an idea whether it will be problematic that the channel is still open when handling the HtlcIncomingDustFinalAction?
I seem to end up in this piece of code here for every new block for a channel that's still open.
What is it that you try to do exactly? I could use a bit more context to answer the question.
If the channel is still open and there are no htlcs at risk (or htlcs at risk are dust), there is no action to be taken for the channel arbitrator.
If the channel is still open and there are no htlcs at risk (or htlcs at risk are dust), there is no action to be taken for the channel arbitrator.
@joostjager I've got expired htlcs, some are even 520 days old. I get go to chain for incoming htlc
logs for these htlcs on every new block. So it seems these htlcs are never resolved. Then I noticed that these HTLCs never get any action assigned if they're sub dust. So it never goes to chain, but also never ends up in a final state. I currently have the suspicion many of these sub dust incoming htlcs are clogging up the work LND is doing and that is causing new htlcs to expire. Hence I thought applying this PR, where the htlcs do end up in a final state, could help.
I don't think that this PR is going to help with that. It only adds reporting of the on-chain resolution to the final resolution table, but doesn't change the actual resolution / mark as complete process itself.
I don't think that this PR is going to help with that. It only adds reporting of the on-chain resolution to the final resolution table, but doesn't change the actual resolution / mark as complete process itself.
Thanks for the quick reply.