lightning icon indicating copy to clipboard operation
lightning copied to clipboard

Coop close stuck at CHANNELD_SHUTTING_DOWN, peer waiting for announcement signatures

Open zerofeerouting opened this issue 2 years ago • 6 comments

Issue and Steps to Reproduce

I've had problems coop closing channels since yesterday.

On some channel closures the node seems to be stuck and not finalizing the close.

Command executed:

docker exec -it clightning lightning-cli close [peer_id]

After executing the console got unresponsive for a long time (more than 10 minutes) until I exited using ctrl+c. When checking in on the channel with listpeers I get the following information:

// ...
               "state_changes": [
                  {
                     "timestamp": "2022-03-27T11:09:12.297Z",
                     "old_state": "CHANNELD_AWAITING_LOCKIN",
                     "new_state": "CHANNELD_NORMAL",
                     "cause": "remote",
                     "message": "Lockin complete"
                  },
                  {
                     "timestamp": "2022-04-19T07:45:13.960Z",
                     "old_state": "CHANNELD_NORMAL",
                     "new_state": "CHANNELD_SHUTTING_DOWN",
                     "cause": "user",
                     "message": "User or plugin invoked close command"
                  }
               ],
               "status": [
                  "CHANNELD_SHUTTING_DOWN:Received error channel 50e491fed1609d3de84ec50e69654fa2665d41379cfeed636a5939c1ad16837c: link failed to shutdown",
                  "CHANNELD_SHUTTING_DOWN:Funding transaction locked. They need our announcement signatures. They've sent shutdown, waiting for ours"
               ],
// ...

This is what my log shows:

2022-04-19T07:10:53.286Z UNUSUAL [peer_id]-channeld-chan#937: Status closed, but waitpid 82838 says No child processes
2022-04-19T07:10:53.287Z INFO    [peer_id]-chan#937: Peer transient failure in CHANNELD_NORMAL: channeld: Owning subdaemon channeld died (-1)
2022-04-19T07:45:13.958Z INFO    [peer_id]-chan#937: State changed from CHANNELD_NORMAL to CHANNELD_SHUTTING_DOWN
2022-04-19T07:45:14.735Z INFO    [peer_id]-chan#937: Peer transient failure in CHANNELD_SHUTTING_DOWN: channeld WARNING: error channel [peer_id]: link failed to shutdown
2022-04-19T07:45:14.735Z UNUSUAL [peer_id]-channeld-chan#937: Status closed, but waitpid 85183 says No child processes
2022-04-19T07:45:19.805Z INFO    [peer_id]-chan#937: Peer transient failure in CHANNELD_SHUTTING_DOWN: channeld WARNING: error channel 50e491fed1609d3de84ec50e69654fa2665d41379cfeed636a5939c1ad16837c: link failed to shutdown

getinfo output

{
   "id": "038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5",
   "alias": "zero fee routing | CLN",
   "color": "1c262f",
   "num_peers": 675,
   "num_pending_channels": 0,
   "num_active_channels": 650,
   "num_inactive_channels": 16,
   "address": [
      {
         "type": "ipv4",
         "address": "93.177.73.229",
         "port": 9735
      },
      {
         "type": "torv3",
         "address": "xtdo5qvvfwcjaruj6z4acdcw4azagn6tdgac4ajnekjdn4ghr6qw2nqd.onion",
         "port": 9735
      }
   ],
   "binding": [
      {
         "type": "ipv4",
         "address": "0.0.0.0",
         "port": 9735
      }
   ],
   "version": "v0.10.2",
   "blockheight": 732526,
   "network": "bitcoin",
   "msatoshi_fees_collected": 248529299,
   "fees_collected_msat": "248529299msat",
   "lightning-dir": "/root/.lightning/bitcoin"
}

zerofeerouting avatar Apr 19 '22 08:04 zerofeerouting

The peer is sending "link failed to shutdown" error messages. I don't know what that means though: some LND error message?

rustyrussell avatar Apr 19 '22 08:04 rustyrussell

Yes, LND I assume.

So basically they tell us link failed to shutdown and that keeps us stuck? Because the process did not quit on it's own. I had to kill it using ctrl + c.

This looks like an LND issue that CLN is not handling correctly (because the process doesn't quit on it's own).

zerofeerouting avatar Apr 19 '22 08:04 zerofeerouting

Okay. I think I found the reason why the coop close didn't go through: There were pending HTLCs.

So CLN seems to wait for those to clear before coop-closing the channel - which is actually desired behaviour.

BUT should the command

lightning-cli close [peer_id]

not return something meaningful like Channel has [amount_htlc] pending HTLCs, trying cooperative close for [timeout] seconds before force-closing the channel instead of being stuck and not returning anything at all?

zerofeerouting avatar Apr 19 '22 08:04 zerofeerouting

Could add notifications that get printed on the CLI, however the command itself is meant to be synchronous, i.e., return only once the desired result has been achieved. In this case it means we'll return only once the channel is closed (and some tools rely on this behavior)

cdecker avatar Apr 19 '22 08:04 cdecker

Yes, it will only return once the close has succeeded. That can take a long time with a peer doing weird stuff like this!

However, this is exactly what notifications are for! We should absolutely wire more of them up for the next version, so we return useful info!!

rustyrussell avatar Apr 19 '22 09:04 rustyrussell

Yes a notification would be very helpful.

I've exited the process with ctrl+c on another channel and just waited. Channel got closed cooperatively after all HTLCs had cleared. So the behaviour Info about pending HTLCs and the command not returning would be ok IMO.

Right now I know what the behaviour means and don't worry anymore, but others might feel differently.

Thank you @rustyrussell and @cdecker

zerofeerouting avatar Apr 19 '22 09:04 zerofeerouting