eclair
eclair copied to clipboard
Fix pruned channels coming back online
When a channel was pruned and we receive a valid channel update for it, we need to re-request the corresponding channel announcement. However, we also need to request the corresponding channel updates: if we don't receive them, we will just prune the channel again.
@sstone can you try that on your node and see if that fixes the issue? I'm not sure how to properly test this...also I believe we'll need to deploy the changes to our node as well, otherwise we'll keep sending obsolete updates anyways?
Fixes #2388
Yes it does seem to fix https://github.com/ACINQ/eclair/issues/2388.
Great, thanks for testing it @sstone! Do you have suggestions on how I could unit test that, or should we merge this without tests?
I suggest we merge the change now to make master more usable but keep #2388 open until we understand more precisely what the problem was ?
Don't you think this explanation correctly describes the issue in #2388? That's what lead me to make the change in this PR.
I'm not sure: I can see why this fix is good but not why channels kept being pruned. Unless you mean that we never received the ChannelUpdates that caused the channel to be pruned in the first place because we were not asking for it ? It makes sense but then how did we identify that such channels were actually not stale, and why did they get pruned so quickly ?
It makes sense but then how did we identify that such channels were actually not stale, and why did they get pruned so quickly ?
The way I understand it, we first receive one non-stale channel update (because one of the two nodes is available). This triggers the following code:
https://github.com/ACINQ/eclair/blob/323aeec09c57194a669da0fec649f9509de850e0/eclair-core/src/main/scala/fr/acinq/eclair/router/Validation.scala#L435
We correctly identify that this channel may not be stale anymore, but we currently have only one of the two channel updates and we had deleted the channel announcement, so we use gossip queries to fetch them again:
https://github.com/ACINQ/eclair/blob/323aeec09c57194a669da0fec649f9509de850e0/eclair-core/src/main/scala/fr/acinq/eclair/router/Validation.scala#L442
But since our gossip queries doesn't ask for the channel updates, we actually only receive the channel_announcement.
Then after a while, we receive TickPruneStaleChannels to regularly check for stale channels.
This uses the following variation of the pruning checks:
https://github.com/ACINQ/eclair/blob/a97e88fae19b806d5b81f4956acc388f4490f3d5/eclair-core/src/main/scala/fr/acinq/eclair/router/StaleChannels.scala#L93
But the channel updates are None, because we didn't request them through our gossip queries, so the channel is identified as staled again and pruned.
If we had correctly received the channel updates (what this PR is doing) and the channel wasn't stale, we wouldn't prune it again. Does that make sense?
@sstone @pm47 I've updated our logic to store more information in the pruned DB to ensure we don't constantly restore channels that are actually half-dead just to prune them again at the next tick of TickPruneStaleChannels, let me know what you think.
Codecov Report
Merging #2405 (2277858) into master (6e381d4) will increase coverage by
0.08%. The diff coverage is90.17%.
@@ Coverage Diff @@
## master #2405 +/- ##
==========================================
+ Coverage 84.84% 84.92% +0.08%
==========================================
Files 198 198
Lines 15796 15843 +47
Branches 680 642 -38
==========================================
+ Hits 13402 13455 +53
+ Misses 2394 2388 -6
| Impacted Files | Coverage Δ | |
|---|---|---|
| .../main/scala/fr/acinq/eclair/db/DualDatabases.scala | 10.79% <0.00%> (-0.16%) |
:arrow_down: |
| .../src/main/scala/fr/acinq/eclair/db/NetworkDb.scala | 100.00% <ø> (ø) |
|
| ...n/scala/fr/acinq/eclair/router/StaleChannels.scala | 91.66% <78.57%> (+1.34%) |
:arrow_up: |
| ...main/scala/fr/acinq/eclair/router/Validation.scala | 94.28% <91.66%> (+0.16%) |
:arrow_up: |
| ...main/scala/fr/acinq/eclair/db/pg/PgNetworkDb.scala | 99.35% <100.00%> (+0.06%) |
:arrow_up: |
| ...la/fr/acinq/eclair/db/sqlite/SqliteNetworkDb.scala | 99.09% <100.00%> (+0.15%) |
:arrow_up: |
| ...cinq/eclair/channel/publish/MempoolTxMonitor.scala | 88.23% <0.00%> (-2.36%) |
:arrow_down: |
| ...cala/fr/acinq/eclair/payment/relay/NodeRelay.scala | 95.86% <0.00%> (-0.69%) |
:arrow_down: |
| ...in/scala/fr/acinq/eclair/channel/fsm/Channel.scala | 86.16% <0.00%> (-0.12%) |
:arrow_down: |
| ...fr/acinq/eclair/channel/InteractiveTxBuilder.scala | 85.60% <0.00%> (+0.52%) |
:arrow_up: |
| ... and 3 more |
It's much better now and seems to have fixed the "prune/request channel update/prune/..." loop !
This works, but I think we can go a step further and remove the pruned_channels table.
That's a great idea, I started implementing this and it's looking good. I'll open a separate PR that will supersede this one.
Superseded by #2456