Disallow dual-sync-async persistence without restarting
In general, we don't expect users to persist
`ChannelMonitor[Update]`s both synchronously and asynchronously for
a single `ChannelManager` instance. If a user has implemented
asynchronous persistence, they should generally always use that,
as there is then no advantage to them to occasionally persist
synchronously.
Even still, in 920d96edb6595289902f287419de2d002e2dc2ee we fixed
some bugs related to such operation, and noted that "there isn't
much cost to supporting it". Sadly, this is not true.
Specifically, the dual-sync-async persistence flow is ill-defined
and difficult to define in away that a user can realistically
implement.
Consider the case of a `ChannelMonitorUpdate` which is persisted
asynchronously and while it is still being persisted a new
`ChannelMonitorUpdate` is created. If the second
`ChannelMonitorUpdate` is persisted synchronously, the
`ChannelManager` will be left with a single pending
`ChannelMonitorUpdate` which is not the latest.
If we were to then restart, the latest copy of the `ChannelMonitor`
would be that without any updates, but the `ChannelManager` has a
pending `ChannelMonitorUpdate` for the next update, but not the one
after that. The user would then have to handle the replayed
`ChannelMonitorUpdate` and then find the second
`ChannelMonitorUpdate` on disk and somehow know to replay that one
as well.
Further, we currently have a bug in handling this scenario as we'll
complete all pending post-update actions when the second
`ChannelMonitorUpdate` gets persisted synchronously, even though
the first `ChannelMonitorUpdate` is still pending. While we could
rather trivially fix these issues, addressing the larger API
question above is difficult and as we don't anticipate this
use-case being important, we just disable it here.
Note that we continue to support it internally as some 39 tests
rely on it.
Issue highlighted by (changes to the) chanmon_consistency fuzz
target (in the next commit).
👋 Thanks for assigning @valentinewallace as a reviewer! I'll wait for their review and will help manage the review process. Once they submit their review, I'll check if a second reviewer would be helpful.
We may be able to remove the test test_sync_async_persist_doesnt_hang since it specifically covers async + sync persistence.
Would it be a good take-a-Friday issue to fix the legacy tests that use both sync + async so we don't need to gate the panics anymore?
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 90.26%. Comparing base (
83e9e80) to head (cb6219d). Report is 60 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #3737 +/- ##
==========================================
+ Coverage 89.12% 90.26% +1.13%
==========================================
Files 156 158 +2
Lines 123514 134266 +10752
Branches 123514 134266 +10752
==========================================
+ Hits 110086 121195 +11109
+ Misses 10749 10536 -213
+ Partials 2679 2535 -144
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
Would it be a good take-a-Friday issue to fix the legacy tests that use both sync + async so we don't need to gate the panics anymore?
Yea, I imagine its a good bit of work, though.
Needs a squash
Squashed with a minor comment fix:
$ git diff-tree -U1 ddccbdabd d009b3918
diff --git a/lightning/src/ln/channelmanager.rs b/lightning/src/ln/channelmanager.rs
index c86eed77b..54b607d88 100644
--- a/lightning/src/ln/channelmanager.rs
+++ b/lightning/src/ln/channelmanager.rs
@@ -2573,4 +2573,4 @@ where
/// [`ChannelMonitorUpdateStatus::Completed`] without restarting. Because the API does not
- /// otherwise directly enforce this, we enforce it in debug builds here by storing which one is
- /// in use.
+ /// otherwise directly enforce this, we enforce it in non-test builds here by storing which one
+ /// is in use.
#[cfg(not(test))]
These externalized tests seem to be failing:
- test_batch_funding_close_after_funding_signed
- test_batch_channel_open
Oof, ugh, right. Squashed-pushed a fix:
$ git diff-tree -U1 d009b3918 cb6219df6
diff --git a/lightning/src/ln/channelmanager.rs b/lightning/src/ln/channelmanager.rs
index 54b607d88..34751bfc9 100644
--- a/lightning/src/ln/channelmanager.rs
+++ b/lightning/src/ln/channelmanager.rs
@@ -2575,3 +2575,3 @@ where
/// is in use.
- #[cfg(not(test))]
+ #[cfg(not(any(test, feature = "_externalize_tests")))]
monitor_update_type: AtomicUsize,
@@ -3321,3 +3321,3 @@ macro_rules! handle_new_monitor_update {
ChannelMonitorUpdateStatus::InProgress => {
- #[cfg(not(test))]
+ #[cfg(not(any(test, feature = "_externalize_tests")))]
if $self.monitor_update_type.swap(1, Ordering::Relaxed) == 2 {
@@ -3330,3 +3330,3 @@ macro_rules! handle_new_monitor_update {
ChannelMonitorUpdateStatus::Completed => {
- #[cfg(not(test))]
+ #[cfg(not(any(test, feature = "_externalize_tests")))]
if $self.monitor_update_type.swap(2, Ordering::Relaxed) == 1 {
@@ -3594,3 +3594,3 @@ where
- #[cfg(not(test))]
+ #[cfg(not(any(test, feature = "_externalize_tests")))]
monitor_update_type: AtomicUsize::new(0),
@@ -14767,3 +14767,3 @@ where
- #[cfg(not(test))]
+ #[cfg(not(any(test, feature = "_externalize_tests")))]
monitor_update_type: AtomicUsize::new(0),
Wondering whether the problem where chan mgr will be left with a single pending update which is not the latest can also occur when purely async is used, but channel_monitor_updated are called in the reverse order?
MonitorEvent::Completed is only allowed to signal "all updates to and through the given id have completed" to avoid this case. ChainMonitor tracks this and generates the right MonitorEvetns.
LGTM after squash
Re-dropped the constant after discussion.
Okay, I believe all feedback has been addressed.
Squashed without further changes.