lnd
lnd copied to clipboard
gossip: node announcement ignored because node not in graph
Background
While running integration tests in the 0-19-staging branch of litd, I noticed that sometimes we don't have a node announcement for a direct channel peer.
I tracked it down to these log lines:
2024-06-26 14:22:35.525 [DBG] PEER: Peer(02333e1d916b7d38b1a920eea1662c8e0723eb5f2604bd51d31fab9a97dda2775a): Received NodeAnnouncement(node=02333e1d916b7d38b1a920eea1662c8e0723eb5f2604bd51d31fab9a97dda2775a, update_time=2024-06-26 14:22:35 +0200 CEST) from 02333e1d916b7d38b1a920eea1662c8e0723eb5f2604bd51d31fab9a97dda2775a@127.0.0.1:11702
2024-06-26 14:22:35.525 [TRC] PEER: Peer(02333e1d916b7d38b1a920eea1662c8e0723eb5f2604bd51d31fab9a97dda2775a): readMessage from peer: (*lnwire.NodeAnnouncement)(0xc0036a0000)({
Signature: (lnwire.Sig) {
bytes: ([64]uint8) (len=64 cap=64) {
00000000 1c c7 06 12 24 a9 aa 3e 9b 0e e7 7b 52 16 6e 51 |....$..>...{R.nQ|
00000010 51 ad 22 51 f3 32 6c 0b fd 52 c0 6a e4 e5 02 d3 |Q."Q.2l..R.j....|
00000020 18 30 68 3b a0 ce b6 f8 d9 90 1c 92 92 d7 b5 27 |.0h;...........'|
00000030 d1 f5 98 61 03 32 91 90 97 73 8c 06 e4 df d0 da |...a.2...s......|
},
...
<snip>
...
}
})
...
2024-06-26 14:22:35.525 [TRC] DISC: Received network message: peer=02333e1d916b7d38b1a920eea1662c8e0723eb5f2604bd51d31fab9a97dda2775a@127.0.0.1:11702, msg=NodeAnnouncement, is_remote=true
2024-06-26 14:22:35.525 [DBG] DISC: Processing NodeAnnouncement: peer=02333e1d916b7d38b1a920eea1662c8e0723eb5f2604bd51d31fab9a97dda2775a@127.0.0.1:11702, timestamp=2024-06-26 14:22:35 +0200 CEST, node=02333e1d916b7d38b1a920eea1662c8e0723eb5f2604bd51d31fab9a97dda2775a
2024-06-26 14:22:35.525 [DBG] CRTR: Checking stale node 303233333365316439313662376433386231613932306565613136363263386530373233656235663236303462643531643331666162396139376464613237373561 got Ignoring node announcement for node not found in channel graph (02333e1d916b7d38b1a920eea1662c8e0723eb5f2604bd51d31fab9a97dda2775a)
2024-06-26 14:22:35.525 [DBG] DISC: Skipped processing stale node: 02333e1d916b7d38b1a920eea1662c8e0723eb5f2604bd51d31fab9a97dda2775a
2024-06-26 14:22:35.525 [TRC] DISC: Processed network message NodeAnnouncement, returned len(announcements)=0, allowDependents=true
This is after we've opened an unannounced channel to that node. And it doesn't happen all the time, so it must be a timing issue.
It's pretty weird that we apparently haven't seen this issue in vanilla lnd yet, or at least it doesn't seem like this happens in the itests there. Or maybe this is an existing flake that we just haven't noticed yet?
Here's the full log (litd in integrated mode with custom channel logic):
3-test_custom_channels_force_close-Charlie-03df097b.log
I've created this naive fix which solves the issue in the integration test: https://github.com/lightningnetwork/lnd/pull/8871 But it sounds like it's mostly fighting a symptom instead of fixing the root cause. But I'm not super familiar with the timings and dependency management of gossip stuff, so looking for other opinions.
cc @ellemouton, @Crypt-iQ
what are these services?
2024-06-26 14:22:35.524 [TRC] TCHN: Request to handle: *lnwire.ChannelUpdate
2024-06-26 14:22:35.524 [TRC] TCHN: Request to handle: 258
2024-06-26 14:22:35.524 [DBG] TCHN: Failed to handle: *lnwire.ChannelUpdate
2024-06-26 14:22:35.524 [DBG] PFSM: MsgRouter: unable to route msg protofsm.PeerMsg
Did some analysis on the logs, here's what happened.
Suppose Alice opens a channel with Bob, the normal flow, from Alice's side,
- received
channel_ready, handle it inhandleChannelReadyReceived, that callsaddToRouterGraph. - send
channel_announcementto herself -> creates the node and channel data in db. - send
channel_updateto herself -> creates the edge inrouterand, - sends
channel_updateto Bob. - sends
node_announcementto Bob.
For Bob he would do the same. However, there's no guarantee the following won't happen,
- Bob receives
channel_ready, finishes the sending ofchannel_announcementandchannel_updateto himself, and sendschannel_updateandnode_announcementto Alice. - Alice receives Bob's
channel_updatewhile still in the process of handlingchannel_ready, in other words, there's no node info in her db. - Alice would cache this
channel_updateto be processed later, as fixed in #6054.
which is the case shown in the logs,
===> Alice received Bob's ChannelUpdate
2024-06-26 14:22:35.524 [DBG] PEER: Peer(02333e1d916b7d38b1a920eea1662c8e0723eb5f2604bd51d31fab9a97dda2775a): Received ChannelUpdate(chain_hash=0f9188f13cb7b2c71f2a335e3a4fc328bf5beb436012afca590b1a11466e2206, short_chan_id=509073883725824, mflags=00000001, cflags=00000000, update_time=2024-06-26 14:22:35 +0200 CEST) from 02333e1d916b7d38b1a920eea1662c8e0723eb5f2604bd51d31fab9a97dda2775a@127.0.0.1:11702
===> Alice received Bob's NodeAnnouncement
2024-06-26 14:22:35.525 [DBG] PEER: Peer(02333e1d916b7d38b1a920eea1662c8e0723eb5f2604bd51d31fab9a97dda2775a): Received NodeAnnouncement(node=02333e1d916b7d38b1a920eea1662c8e0723eb5f2604bd51d31fab9a97dda2775a, update_time=2024-06-26 14:22:35 +0200 CEST) from 02333e1d916b7d38b1a920eea1662c8e0723eb5f2604bd51d31fab9a97dda2775a@127.0.0.1:11702
===> Alice received her own ChannelAnnouncement
2024-06-26 14:22:35.529 [DBG] DISC: Processing ChannelAnnouncement: peer=<nil>, short_chan_id=509073883725824
===> Alice received her own ChannelUpdate
2024-06-26 14:22:35.531 [DBG] DISC: Processing ChannelUpdate: peer=<nil>, short_chan_id=509073883725824,
Alice should have done the same for node_announcement, which I though it was fixed in #7264 but apparently not.
So to properly fix it, we can,
- Fix it in the funding manager, that we only send
node_announcementafter we've received achannel_update, which is fixed from Bob's side. But this doesn't completely fix it as we don't know what other implementations would do. - Cache
node_announcementlike how we handle earlychannel_update.