rust-lightning icon indicating copy to clipboard operation
rust-lightning copied to clipboard

force close is not always broadcasting latest TX

Open zoedberg opened this issue 1 year ago • 3 comments

I'm using RLN, which uses a forked version of rust-lightning, where I've added support for RGB. I'm running an integration test (https://github.com/RGB-Tools/rgb-lightning-node/blob/master/src/test/close_force_standard.rs) where I open a channel between 2 nodes, make 2 payments via keysend and force close the channel.

After the channel closure I sometimes see a behavior (non-deterministic, but quite frequent) where the BumpTransactionEventHandler tries to bump the fees for an HTLC TX (we are using anchor outputs). This is strange because before closing the channel, during the keysend, we wait for the payment to become successful (which happens during the PaymentSent and PaymentClaimed events), therefore we expect the channel to have no pending HTLCs and the node to broadcast the last commiment TX.

I'm writing here first of all to understand if you think this is expected behavior or not. That is, waiting for the mentioned events should guarantee that the OnchainTxHandler will broadcast the last valid commitment TX?

I've also collected some logs, hoping this could help better understand the issue:

  • I see a call to provide_latest_holder_tx that sets the holder_commitment field to the latest commitment TX (the one we expect to be broadcasted, with no HTLCs)
  • the call to channel_manager.force_close_broadcasting_latest_txn happens after the holder_commitment is set to the expected commitment TX
  • by printing self.holder_commitment in get_maybe_signed_holder_tx I see the TX is not the expected one but it's the previous commitment TX (the one with 1 HTLC) instead

I don't think that our changes to support RGB could cause this issue. If you think this part is well-tested on your side I'll investigate more and maybe try to reproduce it on a vanilla node.

zoedberg avatar Jul 03 '24 17:07 zoedberg

FWIW, we saw similar flaky behavior in LDK Node's integration tests.

I might be wrong, but I suspect this could be a race with ChannelMonitor persistence that might have been introduced as a byproduct of #2957, as we no longer wait for persistence to finish before issuing the payment events? (cc @G8XSU)

tnull avatar Jul 04 '24 08:07 tnull

This is strange because before closing the channel, during the keysend, we wait for the payment to become successful (which happens during the PaymentSent and PaymentClaimed events), therefore we expect the channel to have no pending HTLCs and the node to broadcast the last commiment TX.

This is not a guarantee - PaymentSent indicates that we received the preimage back for our payment (i.e. might happen before we even see a new commitment transaction) and PaymentClaimed guarantee sthat the preimage is durably on disk (which might happen before we have a new commitment transaction). We try to get users these events once we're sure things are irrevocable, but that might happen on-chain rather than in-channel.

I see a call to provide_latest_holder_tx...

Hmm, this event order does somewhat surprise me, however. Are you sure you're seeing this on the same node's ChannelMonitor? Can you share the logs (and, to confirm, this is a fork based on LDK 0.0.123)?

I might be wrong, but I suspect this could be a race with ChannelMonitor persistence that might have been introduced as a byproduct of https://github.com/lightningdevkit/rust-lightning/pull/2957, as we no longer wait for persistence to finish before issuing the payment events? (cc @G8XSU)

That shouldn't impact this. That change only impacts persists based on block-connection, but here we're (I think?) just talking about persists due to channel state changes, and specifically when the ChannelManager generates events wrt those persists.

TheBlueMatt avatar Jul 08 '24 16:07 TheBlueMatt

This is not a guarantee - PaymentSent indicates that we received the preimage back for our payment (i.e. might happen before we even see a new commitment transaction) and PaymentClaimed guarantee sthat the preimage is durably on disk (which might happen before we have a new commitment transaction). We try to get users these events once we're sure things are irrevocable, but that might happen on-chain rather than in-channel.

Got it, thanks for the explanation. So I guess in our tests we should keep a small sleep to be sure the force close operation will always broadcast the latest commitment TX.

Hmm, this event order does somewhat surprise me, however. Are you sure you're seeing this on the same node's ChannelMonitor? Can you share the logs (and, to confirm, this is a fork based on LDK 0.0.123)?

I confirm the fork is based on LDK 0.0.123, but nice catch, I wasn't checking which node was printing the change to self.holder_commitment and after rerunning the test with the node information I can now say there is no strange behavior, the node correctly broadcasts the last TX assigned to self.holder_commitment. Sorry for my incorrect reporting. I guess this issue can be closed.

zoedberg avatar Jul 11 '24 12:07 zoedberg