lnd icon indicating copy to clipboard operation
lnd copied to clipboard

[bug]: "edge not found" for local channel

Open C-Otto opened this issue 2 years ago • 20 comments
trafficstars

Background

A peer opened a channel to my node. This channel is listed in lncli listchannels. However, lncli getchaninfo X shows an error (with X being the channel ID):

[lncli] rpc error: code = Unknown desc = edge not found

According to lncli listchannels, this appears to be a regular (inactive) channel. It's not private, it has "remote_constraints", there's an ID in "alias_scids". It's not a zero-conf channel. Strangely, there are two incoming HTLCs already, where "forwarding_channel" is set to 0. Currently, the channel is inactive, but it has been active before.

When reconnecting to the peer, I get:

[INF] PEER: Peer(xxx): loading ChannelPoint(xxx:1)
[WRN] PEER: Peer(xxx): Unable to find our forwarding policy for channel xxx:1, using default values

In my logs I also see:

[INF] RPCS: channel: xxx:1 not found by channel event store

I'd be happy to provide further information.

Your environment

  • lnd v0.15.5-beta
  • Linux server 5.10.0-13-amd64 #1 SMP Debian 5.10.106-1 (2022-03-17) x86_64 GNU/Linux
  • bitcoind v24.0.1

C-Otto avatar Dec 18 '22 21:12 C-Otto

Something weird happened during channel open. It looks like the second open attempt worked?

[INF] FNDG: Recv'd fundingRequest(amt=0.xxx BTC, push=0 mSAT, delay=144, pendingId=xxx) from peer(xxx)
[INF] FNDG: Requiring 4 confirmations for pendingChan(xxx): amt=0.xxx BTC, push_amt=0 mSAT, committype=tweakless, upfrontShutdown=
[INF] FNDG: Cancelling funding reservation for node_key=xxx, chan_id=xxx
[INF] PEER: Peer(xxx): unable to read message from peer: EOF
[ERR] FNDG: received funding error from xxx: chan_id=xxx, err=Channel open canceled by us

a few seconds later:
[INF] FNDG: Recv'd fundingRequest(amt=0.xxx BTC, push=0 mSAT, delay=144, pendingId=xxx) from peer(xxx)
[INF] FNDG: Requiring 4 confirmations for pendingChan(xxx): amt=0.xxx BTC, push_amt=0 mSAT, committype=tweakless, upfrontShutdown=
[INF] FNDG: Sending fundingResp for pending_id(xxx)
[INF] FNDG: completing pending_id(xxx) with ChannelPoint(xxx:1)
[INF] FNDG: sending FundingSigned for pending_id(xxx) over ChannelPoint(xxx:1)

The open transaction is confirmed and, according to mempool.space, opened two other channels to other peers.

C-Otto avatar Dec 18 '22 21:12 C-Otto

I also get this:

[WRN] HSWC: ChannelLink(xxx:1): received warning message from peer: chan_id=0000000000000000000000000000000000000000000000000000000000000000, err=Malformed node_announcement [...]

The node_announcement is extremely long, filling roughly one screen.

C-Otto avatar Dec 18 '22 21:12 C-Otto

Discussed offline, will post more details here after proper investigation.

yyforyongyu avatar Dec 19 '22 08:12 yyforyongyu

I think this is distinct from https://github.com/lightningnetwork/lnd/issues/7229 since this was not a zero-conf channel

Crypt-iQ avatar Jan 05 '23 22:01 Crypt-iQ

i get this issue too for a few channels

dannydeezy avatar Apr 19 '23 19:04 dannydeezy

@dannydeezy with what version of lnd? @C-Otto does this still happen with lnd 0.16.1?

guggero avatar Apr 20 '23 07:04 guggero

I'm running lnd 0.16.0 with some WT related commits, not 0.16.1. Currently, I don't see this issue. The last time the error message appeared in my logs was 2023-04-09. I updated to lnd 0.16.0 a few days before, so I think this is not resolved, yet.

C-Otto avatar Apr 20 '23 15:04 C-Otto

We're running into this also, on nodes running 0.16.0

litch avatar Apr 20 '23 17:04 litch

For those who recently experienced this bug, pls can you try updated to 0.16.1-beta.rc1 to see if that fixes your issue?

It might be that your issue is the same as this user's issue which is distinct from this issue even though it throws the same error message. That user's issue was fixed in 0.16.1-beta.rc1

ellemouton avatar Apr 21 '23 13:04 ellemouton

Same here after upgrading to 0.16.1-beta.rc3:

[INF] DISC: Creating new GossipSyncer for peer=$peer_id
[ERR] NANN: Unable to retrieve chan status for Channel($tx_id:1): edge not found
[INF] DISC: GossipSyncer($peer_id): applying new update horizon: start=2106-02-07 03:11:07 +1100 LHDT, end=2242-03-16 13:56:30 +1100 LHDT, backlog_size=0
[INF] PEER: Peer($peer_id): unable to read message from peer: EOF
[INF] PEER: Peer($peer_id): disconnecting $peer_id@$address, reason: read handler closed
[INF] NTFN: Cancelling epoch notification, epoch_id=39
[INF] DISC: Removing GossipSyncer for peer=$peer_id
[INF] HSWC: ChannelLink($tx_id:1): stopping
[WRN] HSWC: ChannelLink($tx_id:1): error when syncing channel states: link shutting down
[INF] HSWC: ChannelLink($tx_id:1): exited
[INF] HSWC: Removing channel link with ChannelID(x)

Edit: After restarting only LND node the problem did not go away. However, after restarting the other node (CLN) everything started to work as expected.

theborakompanioni avatar Apr 24 '23 10:04 theborakompanioni

Thanks for the info @theborakompanioni! Question to everyone else experiencing this issue - is your channel peer for the problem channel a CLN node?

ellemouton avatar Apr 24 '23 10:04 ellemouton

No, both sides of the channel were LND in our case (we saw this a few times)

litch avatar Apr 25 '23 16:04 litch

I have same case with one of my peers.

My node: LND 0.16.0-beta commit=v0.16.0-beta Peer node: LND 0.16.0

We both had "[lncli] rpc error: code = Unknown desc = edge not found"

(!) The issue was resolved on my peer's side by upgrading LND 0.16.0 -> 0.16.1-beta. Immediately after that the problem disappeared and the channel became active.

On my side the issue is currently active, obviously I need to upgrade my LND to 0.16.1 too, but I haven't done it yet.

Hope this information is helpful.

Addition logs from my side:


2023-04-22 12:42:57.884 [WRN] NANN: Unable to find channel policies for xxx:1, skipping. This is typical if the channel is in the process of closing.
...
2023-04-22 12:42:58.114 [INF] PEER: Peer(xxx): loading ChannelPoint(xxx:1)
2023-04-22 12:42:58.114 [WRN] PEER: Peer(xxx): Unable to find our forwarding policy for channel xxx:1, using default values
...
2023-04-22 12:42:58.344 [INF] HSWC: ChannelLink(1863a7ca2bb10229e44b158223aa04e0e3a185b056f55263e3e5b0d4291a5235:1): starting
...
2023-04-22 12:42:58.358 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(xxx:1)
2023-04-22 12:42:58.358 [INF] HSWC: ChannelLink(xxx:1): HTLC manager started, bandwidth=0 mSAT
2023-04-22 12:42:58.358 [INF] HSWC: ChannelLink(xxx:1): attempting to re-synchronize
2023-04-22 12:42:58.358 [WRN] HSWC: ChannelLink(xxx:1): error when syncing channel states: unable to send chan sync message for ChannelPoint(xxx:1): peer exiting
2023-04-22 12:42:58.359 [ERR] HSWC: ChannelLink(xxx:1): failing link: unable to synchronize channel states: unable to send chan sync message for ChannelPoint(xxx:1): peer exiting with error: unable to resume channel, recovery required
2023-04-22 12:42:58.359 [ERR] FNDG: Unable to advance state(xxx:1): failed adding to router graph: error generating channel announcement: unable to generate channel update announcement: channel not found
2023-04-22 12:42:58.359 [INF] HSWC: ChannelLink(xxx:1): exited

SunnySarahNode avatar Apr 25 '23 18:04 SunnySarahNode

hi @SunnySarahNode - i think your specific case was solved in 0.16.1-beta (in https://github.com/lightningnetwork/lnd/pull/7613) . Pls can you update & see if that works? your peer managing to solve the problem on their side by upgrading makes me think it is the same as this user: https://github.com/lightningnetwork/lnd/issues/6987#issuecomment-1510377248

ellemouton avatar Apr 25 '23 18:04 ellemouton

Closing this issue, as the fix has been provided with 0.16.1. Feel free to reopen if the problem persists.

saubyk avatar Aug 30 '23 01:08 saubyk

@saubyk The fix in 0.16.1 fixes this for new channels, but does not retroactively fix the old ones. This is still an issue if you ever ended up in the situation where you ended up in this situation. Note that it affects forwarding of htlcs as well, because the channel policy cannot be found for the edge, which makes it use the default channel policy.

JssDWt avatar Aug 30 '23 12:08 JssDWt

Hello, I have the same issue. I recently opened a channel to a peer, but I get the same error mentioned. I restarted LND, disconnected and connected again to the peer. Didn't help.

If I run, I get the following error

lncli getchaninfo 9133577121506xxxxx [lncli] rpc error: code = Unknown desc = edge not found

Version: 0.17.4-beta

I have attached the log trace of the opening transaction till I received the error. boltz-open-channel.log

Maybe you can make some sense out of it. Would be nice to be able to use the channel.

Big thanks!!

miephos avatar Feb 21 '24 22:02 miephos

From the logs, the policy was not found even in lnd v0.17.4,

2024-02-16 08:51:13.826 [WRN] PEER: Peer(02d96eadea3d780104449aca5c93461ce67c1564e2e1d73225fa67dd3b997a6018): Unable to find our forwarding policy for channel c33ee3205ba326e66570078ddfd004be41307718e337e499c8ca2e9158c44bf6:1, using default values

which is originated here when the edge cannot be found: https://github.com/lightningnetwork/lnd/blob/98c52df4d2378512b721f4e1140efc5ac0ef5904/peer/brontide.go#L1024-L1026

These four lines are also weird,

2024-02-16 08:51:37.856 [ERR] NANN: Unable to retrieve chan status for Channel(c33ee3205ba326e66570078ddfd004be41307718e337e499c8ca2e9158c44bf6:1): edge not found
2024-02-16 08:52:37.843 [ERR] NANN: Unable to retrieve chan status for Channel(c33ee3205ba326e66570078ddfd004be41307718e337e499c8ca2e9158c44bf6:1): edge not found
2024-02-16 08:53:37.842 [ERR] NANN: Unable to retrieve chan status for Channel(c33ee3205ba326e66570078ddfd004be41307718e337e499c8ca2e9158c44bf6:1): edge not found
2024-02-16 08:54:37.841 [ERR] NANN: Unable to retrieve chan status for Channel(c33ee3205ba326e66570078ddfd004be41307718e337e499c8ca2e9158c44bf6:1): edge not found

As we'd expect markPendingInactiveChannels to be called every 30s (default) here, but it's called 4 times instead, https://github.com/lightningnetwork/lnd/blob/98c52df4d2378512b721f4e1140efc5ac0ef5904/netann/chan_status_manager.go#L353-L359

There's also a disconnect/connect attempt here, I assume it's triggered manually?

2024-02-16 08:51:12.148 [INF] SRVR: Disconnecting from 02d96eadea3d780104449aca5c93461ce67c1564e2e1d73225fa67dd3b997a6018@45.86.229.190:9736
2024-02-16 08:51:12.148 [INF] PEER: Peer(02d96eadea3d780104449aca5c93461ce67c1564e2e1d73225fa67dd3b997a6018): disconnecting 02d96eadea3d780104449aca5c93461ce67c1564e2e1d73225fa67dd3b997a6018@45.86.229.190:9736, reason: server: DisconnectPeer called
2024-02-16 08:51:12.148 [INF] PEER: Peer(02d96eadea3d780104449aca5c93461ce67c1564e2e1d73225fa67dd3b997a6018): unable to read message from peer: read next header: read tcp 10.10.21.21:55308->45.86.229.190:9736: use of closed network connection
2024-02-16 08:51:12.157 [ERR] RPCS: [connectpeer]: error connecting to peer: already connected to peer: 02d96eadea3d780104449aca5c93461ce67c1564e2e1d73225fa67dd3b997a6018@45.86.229.190:9736
2024-02-16 08:51:12.158 [ERR] RPCS: [/lnrpc.Lightning/ConnectPeer]: already connected to peer: 02d96eadea3d780104449aca5c93461ce67c1564e2e1d73225fa67dd3b997a6018@45.86.229.190:9736
2024-02-16 08:51:12.248 [INF] DISC: Removing GossipSyncer for peer=02d96eadea3d780104449aca5c93461ce67c1564e2e1d73225fa67dd3b997a6018

@miephos I've noticed the chan was forced closed - do you by chance still have the logs? What about the channel type, like is it a public or private chan?

yyforyongyu avatar Jun 06 '24 17:06 yyforyongyu

@miephos I've noticed the chan was forced closed - do you by chance still have the logs? What about the channel type, like is it a public or private chan?

The channel was a public channel. I did a manual disconnect and connect to peer. It still seemed stuck. So I force closed the the channel.

I still have the logs. Let me know how to provide them, if needed.

miephos avatar Jun 22 '24 21:06 miephos

@miephos it'd be great to provide some logs! You can find me via the handle yyforyongyu on slack/twitter/keybase, thanks!

yyforyongyu avatar Jun 24 '24 13:06 yyforyongyu

Yeah really strange behaviour would it make sense to bring this log to the Error level, so that we at least know when the edge failed to be added to the graph, otherwise there are some situations where we basically shadow the err which makes it really hard to tell what happened if node runners don't run the gossiper in the debug level ?

https://github.com/lightningnetwork/lnd/blob/71ba355d903f01990718866c358c2ccac9402891/discovery/gossiper.go#L2532-L2533

ziggie1984 avatar Jul 01 '24 10:07 ziggie1984