eclair icon indicating copy to clipboard operation
eclair copied to clipboard

Fix pruned channels coming back online

Open t-bast opened this issue 3 years ago • 6 comments
trafficstars

When a channel was pruned and we receive a valid channel update for it, we need to re-request the corresponding channel announcement. However, we also need to request the corresponding channel updates: if we don't receive them, we will just prune the channel again.

@sstone can you try that on your node and see if that fixes the issue? I'm not sure how to properly test this...also I believe we'll need to deploy the changes to our node as well, otherwise we'll keep sending obsolete updates anyways?

Fixes #2388

t-bast avatar Sep 01 '22 09:09 t-bast

Yes it does seem to fix https://github.com/ACINQ/eclair/issues/2388.

sstone avatar Sep 12 '22 11:09 sstone

Great, thanks for testing it @sstone! Do you have suggestions on how I could unit test that, or should we merge this without tests?

t-bast avatar Sep 13 '22 13:09 t-bast

I suggest we merge the change now to make master more usable but keep #2388 open until we understand more precisely what the problem was ?

sstone avatar Sep 13 '22 15:09 sstone

Don't you think this explanation correctly describes the issue in #2388? That's what lead me to make the change in this PR.

t-bast avatar Sep 13 '22 15:09 t-bast

I'm not sure: I can see why this fix is good but not why channels kept being pruned. Unless you mean that we never received the ChannelUpdates that caused the channel to be pruned in the first place because we were not asking for it ? It makes sense but then how did we identify that such channels were actually not stale, and why did they get pruned so quickly ?

sstone avatar Sep 14 '22 08:09 sstone

It makes sense but then how did we identify that such channels were actually not stale, and why did they get pruned so quickly ?

The way I understand it, we first receive one non-stale channel update (because one of the two nodes is available). This triggers the following code:

https://github.com/ACINQ/eclair/blob/323aeec09c57194a669da0fec649f9509de850e0/eclair-core/src/main/scala/fr/acinq/eclair/router/Validation.scala#L435

We correctly identify that this channel may not be stale anymore, but we currently have only one of the two channel updates and we had deleted the channel announcement, so we use gossip queries to fetch them again:

https://github.com/ACINQ/eclair/blob/323aeec09c57194a669da0fec649f9509de850e0/eclair-core/src/main/scala/fr/acinq/eclair/router/Validation.scala#L442

But since our gossip queries doesn't ask for the channel updates, we actually only receive the channel_announcement. Then after a while, we receive TickPruneStaleChannels to regularly check for stale channels. This uses the following variation of the pruning checks:

https://github.com/ACINQ/eclair/blob/a97e88fae19b806d5b81f4956acc388f4490f3d5/eclair-core/src/main/scala/fr/acinq/eclair/router/StaleChannels.scala#L93

But the channel updates are None, because we didn't request them through our gossip queries, so the channel is identified as staled again and pruned.

If we had correctly received the channel updates (what this PR is doing) and the channel wasn't stale, we wouldn't prune it again. Does that make sense?

t-bast avatar Sep 14 '22 08:09 t-bast

@sstone @pm47 I've updated our logic to store more information in the pruned DB to ensure we don't constantly restore channels that are actually half-dead just to prune them again at the next tick of TickPruneStaleChannels, let me know what you think.

t-bast avatar Oct 03 '22 15:10 t-bast

Codecov Report

Merging #2405 (2277858) into master (6e381d4) will increase coverage by 0.08%. The diff coverage is 90.17%.

@@            Coverage Diff             @@
##           master    #2405      +/-   ##
==========================================
+ Coverage   84.84%   84.92%   +0.08%     
==========================================
  Files         198      198              
  Lines       15796    15843      +47     
  Branches      680      642      -38     
==========================================
+ Hits        13402    13455      +53     
+ Misses       2394     2388       -6     
Impacted Files Coverage Δ
.../main/scala/fr/acinq/eclair/db/DualDatabases.scala 10.79% <0.00%> (-0.16%) :arrow_down:
.../src/main/scala/fr/acinq/eclair/db/NetworkDb.scala 100.00% <ø> (ø)
...n/scala/fr/acinq/eclair/router/StaleChannels.scala 91.66% <78.57%> (+1.34%) :arrow_up:
...main/scala/fr/acinq/eclair/router/Validation.scala 94.28% <91.66%> (+0.16%) :arrow_up:
...main/scala/fr/acinq/eclair/db/pg/PgNetworkDb.scala 99.35% <100.00%> (+0.06%) :arrow_up:
...la/fr/acinq/eclair/db/sqlite/SqliteNetworkDb.scala 99.09% <100.00%> (+0.15%) :arrow_up:
...cinq/eclair/channel/publish/MempoolTxMonitor.scala 88.23% <0.00%> (-2.36%) :arrow_down:
...cala/fr/acinq/eclair/payment/relay/NodeRelay.scala 95.86% <0.00%> (-0.69%) :arrow_down:
...in/scala/fr/acinq/eclair/channel/fsm/Channel.scala 86.16% <0.00%> (-0.12%) :arrow_down:
...fr/acinq/eclair/channel/InteractiveTxBuilder.scala 85.60% <0.00%> (+0.52%) :arrow_up:
... and 3 more

codecov-commenter avatar Oct 03 '22 16:10 codecov-commenter

It's much better now and seems to have fixed the "prune/request channel update/prune/..." loop !

sstone avatar Oct 06 '22 15:10 sstone

This works, but I think we can go a step further and remove the pruned_channels table.

That's a great idea, I started implementing this and it's looking good. I'll open a separate PR that will supersede this one.

t-bast avatar Oct 11 '22 16:10 t-bast

Superseded by #2456

t-bast avatar Oct 13 '22 19:10 t-bast