Don't Always Broadcast latest state on startup if we'd previously closed-without-broadcast
On startup the ChannelManager will broadcast the latest state transaction(s) for any ChannelMonitor it has for which it has no Channel. This is all fine and great except if we'd previously closed the ChannelMonitor with a ChannelForceClosed { should_broadcast: false } indicating it may be unsafe to broadcast. In this case, we should disable automated broadcasting. This should just require tracking the should_broadcast state in the ChannelMonitor itself and checking it before broadcasting (maybe a new automated flag on broadcast_Latest_holder_commitment_txn?)
This is really "the" issue on #775 so I'm gonna close that in favor of this one.
I want to take this as my first issue.
When you say "On startup the ChannelManager will broadcast the latest state transaction(s) for any ChannelMonitor it has for which it has no Channel" I assume you mean channelmanager.rs:6823-6828 where it checks for each channel monitor if it's funding transaction is not in the funding transactions set, right? Then if I understand this correctly, when de-serializing the persisted state, every ChannelMonitorUpdateStep for a ChannelMonitor is replayed so that we build up to the state we had before the last shutdown. Then, the solution you propose would be to
- Add a new property
allow_automated_broadcasttoChannelMonitorImplthat defaults totrue - On
update_monitor, when matchingChannelForceClosedwithshould_broadcast: falseset the flag tofalse - On
broadcast_latest_holder_commitment_txncheck for the flag and if it'sfalsethen...log_info!and do nothing?
Some questions I have:
-
It feels odd to have a method called
braodcast...that won't always broadcast. Should we either:- Rename the method to
maybe_broadcast_latest_holder_commitment_txn, so that it's explicit that under certain conditions it will not broadcast? - Define a new method
maybe_broadcast...that wraps up the current one but checks the flag first and use this new method in this specific use case?
I'm also asking because the method is also called when the de-serialized
ChannelManageris stale compared to theChannelMonitor, though in this case there is a channel in the founding transactions set (lines 6789-6804). I assume in that case too we want the flag to stop us, but I want to make sure I'm not changing some important semantics. - Rename the method to
-
Is there any other
ChannelMonitorUpdateStepthat would imply that it's safe again to broadcast? Looking at the docs for the Enum I would guess no, but asking just in case.
Heh, 1564 didn't resolve this, it just referenced it...
Yes, your understanding is correct, as for your questions:
It feels odd to have a method called braodcast... that won't always broadcast. Should we either:
Probably the second, its still important that users be able to broadcast the latest state if they want to, so we at least need an option to broadcast the state, though it also wouldn't be unreasonable to split it into two methods - maybe_ and force_..._unsafe to indicate that the first will only work if we think its okay to do so, and the second will force it to happen even if its unsafe to do so.
I'm also asking because the method is also called when the de-serialized ChannelManager is stale compared to the ChannelMonitor, though in this case there is a channel in the founding transactions set (lines 6789-6804). I assume in that case too we want the flag to stop us, but I want to make sure I'm not changing some important semantics.
I think in that case, yes, the flag should stop us (though of course generally the flag is not set).
Is there any other ChannelMonitorUpdateStep that would imply that it's safe again to broadcast? Looking at the docs for the Enum I would guess no, but asking just in case.
No, once its unsafe its unsafe. Unsafe here means we've lost some data, there's no way to recover that data.
Great. I'll get to work into the PR then. Thanks!
I'm having a hard time trying to test the code. I'll keep working on it but I'm posting my progress here in case anyone wants to throw help/comments/ideas. My idea would be to cover this new behavior with a simple unit test that:
- Builds a
ChannelMonitor - Sends it a
ForceClosed{should_broadcast: false} - Writes and reads it again
- And then tries to call the new methods and asserts whether the transaction was broadcasted.
My plan was to copy and adapt some other other unit test for broadcast_latest_holder_commitment_txn but couldn't find any. If I understand correctly, you write unit tests for this at the end of the module and integration tests in separate modules, mostly in funcional_tests.rs. I also found:
- A
TestPersisterthat I think implement Write And Read in-memory. - A
TestBroadcasterthat can probably help me with the assertions - A
check_closed_broadcast!which I believe won't be useful here cause it's checking whether we broadcasted the channel closed message as a gossip update for network peers. Though I'm not really sure I understood correctly. - On your last PR Matt, you added
do_test_data_loss_protectwhich seems to be a more general case integration test. Maybe I should just (or also) extend that one?
I left a small pointer on your PR and let's continue there but in general its probably simplest and most-coverage to add a new functional test based on an existing one.
2025-01-30T10:33:55.371Z hydra_bitcoin::node::logger:22: A ChannelManager is stale compared to the current ChannelMonitor! imports.wbg.__wbg_error_80de38b3f7cc3c3c @ hydra_node.js:9025 hydra_node.js:9025 ERROR 2025-01-30T10:33:55.372Z hydra_bitcoin::node::logger:22: The channel will be force-closed and the latest commitment transaction from the ChannelMonitor broadcast. imports.wbg.__wbg_error_80de38b3f7cc3c3c @ hydra_node.js:9025 hydra_node.js:9025 ERROR 2025-01-30T10:33:55.372Z hydra_bitcoin::node::logger:22: The ChannelMonitor for channel b37b4660dec0340c000ab33a9c877d671a7e4fe3f7f9945cd9704668e760430f is at update_id 27 but the ChannelManager is at update_id 26. imports.wbg.__wbg_error_80de38b3f7cc3c3c @ hydra_node.js:9025 hydra_node.js:9025 ERROR 2025-01-30T10:33:55.372Z hydra_bitcoin::node::logger:22: The ChannelMonitor for channel b37b4660dec0340c000ab33a9c877d671a7e4fe3f7f9945cd9704668e760430f is at holder commitment number 281474976710644 but the ChannelManager is at holder commitment number 281474976710645. imports.wbg.__wbg_error_80de38b3f7cc3c3c @ hydra_node.js:9025 hydra_node.js:9025 ERROR 2025-01-30T10:33:55.372Z hydra_bitcoin::node::logger:22: A ChannelManager is stale compared to the current ChannelMonitor! imports.wbg.__wbg_error_80de38b3f7cc3c3c @ hydra_node.js:9025 hydra_node.js:9025 ERROR 2025-01-30T10:33:55.372Z hydra_bitcoin::node::logger:22: The channel will be force-closed and the latest commitment transaction from the ChannelMonitor broadcast. imports.wbg.__wbg_error_80de38b3f7cc3c3c @ hydra_node.js:9025 hydra_node.js:9025 ERROR 2025-01-30T10:33:55.372Z hydra_bitcoin::node::logger:22: The ChannelMonitor for channel d91933328927a0fcd20613150cbb852bf8d9d5873b716ab44ce50d504975b8a4 is at update_id 15 but the ChannelManager is at update_id 13. imports.wbg.__wbg_error_80de38b3f7cc3c3c @ hydra_node.js:9025 hydra_node.js:9025 ERROR 2025-01-30T10:33:55.372Z hydra_bitcoin::node::logger:22: The ChannelMonitor for channel d91933328927a0fcd20613150cbb852bf8d9d5873b716ab44ce50d504975b8a4 is at holder commitment number 281474976710649 but the ChannelManager is at holder commitment number 281474976710650. imports.wbg.__wbg_error_80de38b3f7cc3c3c @ hydra_node.js:9025 hydra_node.js:9025 ERROR 2025-01-30T10:33:55.372Z hydra_bitcoin::node::logger:22: The ChannelMonitor for channel d91933328927a0fcd20613150cbb852bf8d9d5873b716ab44ce50d504975b8a4 is at revoked counterparty transaction number 281474976710650 but the ChannelManager is at revoked counterparty transaction number 281474976710651. imports.wbg.__wbg_error_80de38b3f7cc3c3c @ hydra_node.js:9025
is this related?
@philipp1992 not quite. It seems an update was made to the channel that was persisted at the ChannelMonitor, but not at the ChannelManager. Are you using async monitor persistence? Did the node crash or not shutdown cleanly? Sharing some more logs before the restart may be helpful.
https://github.com/lightningdevkit/rust-lightning/pull/3893#issuecomment-3014239444
@TheBlueMatt should this issue be closed in favor of https://github.com/lightningdevkit/rust-lightning/pull/3881 that essentially makes this a non-issue?
I don't feel super strongly that we shouldn't do #3893 and then revert it and drop the FC-without-broadcast stuff once we have a cleaner way of doing recovery (vs just dropping it now an adding the recovery mechanism separately). I was dropping the FC-without-broadcast logic now because it seemed easier than fixing it (before eventually dropping it), but I'm open to pushback here. Given I don't think its been (ever) used (in part because its always been broken), I would lean towards just ripping it out, but again definitely open to a broader discussion on that.
Closed in #3881.