rust-lightning icon indicating copy to clipboard operation
rust-lightning copied to clipboard

Don't Always Broadcast latest state on startup if we'd previously closed-without-broadcast

Open TheBlueMatt opened this issue 3 years ago • 8 comments

On startup the ChannelManager will broadcast the latest state transaction(s) for any ChannelMonitor it has for which it has no Channel. This is all fine and great except if we'd previously closed the ChannelMonitor with a ChannelForceClosed { should_broadcast: false } indicating it may be unsafe to broadcast. In this case, we should disable automated broadcasting. This should just require tracking the should_broadcast state in the ChannelMonitor itself and checking it before broadcasting (maybe a new automated flag on broadcast_Latest_holder_commitment_txn?)

This is really "the" issue on #775 so I'm gonna close that in favor of this one.

TheBlueMatt avatar Jun 23 '22 21:06 TheBlueMatt

I want to take this as my first issue.

When you say "On startup the ChannelManager will broadcast the latest state transaction(s) for any ChannelMonitor it has for which it has no Channel" I assume you mean channelmanager.rs:6823-6828 where it checks for each channel monitor if it's funding transaction is not in the funding transactions set, right? Then if I understand this correctly, when de-serializing the persisted state, every ChannelMonitorUpdateStep for a ChannelMonitor is replayed so that we build up to the state we had before the last shutdown. Then, the solution you propose would be to

  • Add a new property allow_automated_broadcast to ChannelMonitorImpl that defaults to true
  • On update_monitor, when matching ChannelForceClosed with should_broadcast: false set the flag to false
  • On broadcast_latest_holder_commitment_txn check for the flag and if it's false then... log_info! and do nothing?

Some questions I have:

  • It feels odd to have a method called braodcast... that won't always broadcast. Should we either:

    • Rename the method to maybe_broadcast_latest_holder_commitment_txn, so that it's explicit that under certain conditions it will not broadcast?
    • Define a new method maybe_broadcast... that wraps up the current one but checks the flag first and use this new method in this specific use case?

    I'm also asking because the method is also called when the de-serialized ChannelManager is stale compared to the ChannelMonitor, though in this case there is a channel in the founding transactions set (lines 6789-6804). I assume in that case too we want the flag to stop us, but I want to make sure I'm not changing some important semantics.

  • Is there any other ChannelMonitorUpdateStep that would imply that it's safe again to broadcast? Looking at the docs for the Enum I would guess no, but asking just in case.

QPotato avatar Jun 25 '22 01:06 QPotato

Heh, 1564 didn't resolve this, it just referenced it...

TheBlueMatt avatar Jun 27 '22 16:06 TheBlueMatt

Yes, your understanding is correct, as for your questions:

It feels odd to have a method called braodcast... that won't always broadcast. Should we either:

Probably the second, its still important that users be able to broadcast the latest state if they want to, so we at least need an option to broadcast the state, though it also wouldn't be unreasonable to split it into two methods - maybe_ and force_..._unsafe to indicate that the first will only work if we think its okay to do so, and the second will force it to happen even if its unsafe to do so.

I'm also asking because the method is also called when the de-serialized ChannelManager is stale compared to the ChannelMonitor, though in this case there is a channel in the founding transactions set (lines 6789-6804). I assume in that case too we want the flag to stop us, but I want to make sure I'm not changing some important semantics.

I think in that case, yes, the flag should stop us (though of course generally the flag is not set).

Is there any other ChannelMonitorUpdateStep that would imply that it's safe again to broadcast? Looking at the docs for the Enum I would guess no, but asking just in case.

No, once its unsafe its unsafe. Unsafe here means we've lost some data, there's no way to recover that data.

TheBlueMatt avatar Jun 27 '22 16:06 TheBlueMatt

Great. I'll get to work into the PR then. Thanks!

QPotato avatar Jun 27 '22 22:06 QPotato

I'm having a hard time trying to test the code. I'll keep working on it but I'm posting my progress here in case anyone wants to throw help/comments/ideas. My idea would be to cover this new behavior with a simple unit test that:

  • Builds a ChannelMonitor
  • Sends it a ForceClosed{should_broadcast: false}
  • Writes and reads it again
  • And then tries to call the new methods and asserts whether the transaction was broadcasted.

My plan was to copy and adapt some other other unit test for broadcast_latest_holder_commitment_txn but couldn't find any. If I understand correctly, you write unit tests for this at the end of the module and integration tests in separate modules, mostly in funcional_tests.rs. I also found:

  • A TestPersister that I think implement Write And Read in-memory.
  • A TestBroadcaster that can probably help me with the assertions
  • A check_closed_broadcast! which I believe won't be useful here cause it's checking whether we broadcasted the channel closed message as a gossip update for network peers. Though I'm not really sure I understood correctly.
  • On your last PR Matt, you added do_test_data_loss_protect which seems to be a more general case integration test. Maybe I should just (or also) extend that one?

QPotato avatar Jul 04 '22 18:07 QPotato

I left a small pointer on your PR and let's continue there but in general its probably simplest and most-coverage to add a new functional test based on an existing one.

TheBlueMatt avatar Jul 05 '22 01:07 TheBlueMatt

2025-01-30T10:33:55.371Z hydra_bitcoin::node::logger:22: A ChannelManager is stale compared to the current ChannelMonitor! imports.wbg.__wbg_error_80de38b3f7cc3c3c @ hydra_node.js:9025 hydra_node.js:9025 ERROR 2025-01-30T10:33:55.372Z hydra_bitcoin::node::logger:22: The channel will be force-closed and the latest commitment transaction from the ChannelMonitor broadcast. imports.wbg.__wbg_error_80de38b3f7cc3c3c @ hydra_node.js:9025 hydra_node.js:9025 ERROR 2025-01-30T10:33:55.372Z hydra_bitcoin::node::logger:22: The ChannelMonitor for channel b37b4660dec0340c000ab33a9c877d671a7e4fe3f7f9945cd9704668e760430f is at update_id 27 but the ChannelManager is at update_id 26. imports.wbg.__wbg_error_80de38b3f7cc3c3c @ hydra_node.js:9025 hydra_node.js:9025 ERROR 2025-01-30T10:33:55.372Z hydra_bitcoin::node::logger:22: The ChannelMonitor for channel b37b4660dec0340c000ab33a9c877d671a7e4fe3f7f9945cd9704668e760430f is at holder commitment number 281474976710644 but the ChannelManager is at holder commitment number 281474976710645. imports.wbg.__wbg_error_80de38b3f7cc3c3c @ hydra_node.js:9025 hydra_node.js:9025 ERROR 2025-01-30T10:33:55.372Z hydra_bitcoin::node::logger:22: A ChannelManager is stale compared to the current ChannelMonitor! imports.wbg.__wbg_error_80de38b3f7cc3c3c @ hydra_node.js:9025 hydra_node.js:9025 ERROR 2025-01-30T10:33:55.372Z hydra_bitcoin::node::logger:22: The channel will be force-closed and the latest commitment transaction from the ChannelMonitor broadcast. imports.wbg.__wbg_error_80de38b3f7cc3c3c @ hydra_node.js:9025 hydra_node.js:9025 ERROR 2025-01-30T10:33:55.372Z hydra_bitcoin::node::logger:22: The ChannelMonitor for channel d91933328927a0fcd20613150cbb852bf8d9d5873b716ab44ce50d504975b8a4 is at update_id 15 but the ChannelManager is at update_id 13. imports.wbg.__wbg_error_80de38b3f7cc3c3c @ hydra_node.js:9025 hydra_node.js:9025 ERROR 2025-01-30T10:33:55.372Z hydra_bitcoin::node::logger:22: The ChannelMonitor for channel d91933328927a0fcd20613150cbb852bf8d9d5873b716ab44ce50d504975b8a4 is at holder commitment number 281474976710649 but the ChannelManager is at holder commitment number 281474976710650. imports.wbg.__wbg_error_80de38b3f7cc3c3c @ hydra_node.js:9025 hydra_node.js:9025 ERROR 2025-01-30T10:33:55.372Z hydra_bitcoin::node::logger:22: The ChannelMonitor for channel d91933328927a0fcd20613150cbb852bf8d9d5873b716ab44ce50d504975b8a4 is at revoked counterparty transaction number 281474976710650 but the ChannelManager is at revoked counterparty transaction number 281474976710651. imports.wbg.__wbg_error_80de38b3f7cc3c3c @ hydra_node.js:9025

is this related?

philipp1992 avatar Jan 30 '25 10:01 philipp1992

@philipp1992 not quite. It seems an update was made to the channel that was persisted at the ChannelMonitor, but not at the ChannelManager. Are you using async monitor persistence? Did the node crash or not shutdown cleanly? Sharing some more logs before the restart may be helpful.

wpaulino avatar Jan 30 '25 18:01 wpaulino

https://github.com/lightningdevkit/rust-lightning/pull/3893#issuecomment-3014239444

@TheBlueMatt should this issue be closed in favor of https://github.com/lightningdevkit/rust-lightning/pull/3881 that essentially makes this a non-issue?

martinsaposnic avatar Jun 30 '25 15:06 martinsaposnic

I don't feel super strongly that we shouldn't do #3893 and then revert it and drop the FC-without-broadcast stuff once we have a cleaner way of doing recovery (vs just dropping it now an adding the recovery mechanism separately). I was dropping the FC-without-broadcast logic now because it seemed easier than fixing it (before eventually dropping it), but I'm open to pushback here. Given I don't think its been (ever) used (in part because its always been broken), I would lean towards just ripping it out, but again definitely open to a broader discussion on that.

TheBlueMatt avatar Jul 08 '25 21:07 TheBlueMatt

Closed in #3881.

TheBlueMatt avatar Aug 31 '25 01:08 TheBlueMatt