bolts icon indicating copy to clipboard operation
bolts copied to clipboard

[WIP] BOLT 7: Inventory-based gossip

Open sstone opened this issue 6 years ago • 9 comments

This is a proposal for adding inventory messages to the gossip protocol. With the current design, nodes which receive gossip messages (node announcements, channel announcements and updates) will relay them to their peers, which will result in duplicates for nodes that are connected to many different peers.

Nodes which support the option_inv_gossip feature will instead broadcast inventory messages which contain identifiers for gossip messages they've received. Receiving node will compare these identifiers to their local view of the routing table, and ask for missing or outdated messages using channel queries. This implies that option_inv_gossip cannot be used without gossip_queries.

It builds upon PR #571 (unification of feature bits, which includes a definition for option_inv_gossip) and #557 (extended channel queries). More specifically, it is a "companion" PR to #557:

  • extended channel queries are used to efficiently synchronise routing tables between a node that is often offline and a very limited number of peers
  • inventory-based gossip is used to minimize duplicate gossip traffic for nodes that are connected to many different peers
  • they share similar concern: how to efficiently advertise information about channel announcement and updates

In fact, the inventory message that I propose to use here is almost identical to an extended reply_channel_range message and includes short channel ids, checksums and timestamps, to allow receiving to efficiently query messages based on content (checksum) and timestamps.

Node announcements have been left out on purpose because they have a much more limited impact on bandwidth than channel updates, and because gains would be smaller (you would still need to advertise node ids which are 33 bytes long).

sstone avatar Feb 28 '19 15:02 sstone

Other thought was to get nodes to opt-out of updates for most channels from most peers. Since node has a full map of the network it could tell most peers not to send updates for a list of channels that it knows it will get quicker from multiple other peers. Some info here: https://github.com/ACINQ/eclair/issues/864

n1bor avatar Mar 20 '19 20:03 n1bor

Fwiw we can already opt out entirely of updates on a peer by peer basis by not sending gossip_timestamp_filter.

Le mer. 20 mars 2019 à 21:17, n1bor [email protected] a écrit :

Other thought was to get nodes to opt-out of updates for most channels from most peers. Since node has a full map of the network it could tell most peers not to send updates for a list of channels that it knows it will get quicker from multiple other peers. Some info here: ACINQ/eclair#864 https://github.com/ACINQ/eclair/issues/864

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lightningnetwork/lightning-rfc/pull/584#issuecomment-475010943, or mute the thread https://github.com/notifications/unsubscribe-auth/AB7yvk3Hfrnfn4lYSv9XNsNlAhmDEBgGks5vYpdbgaJpZM4bXBIw .

pm47 avatar Mar 20 '19 20:03 pm47

Just some stats on scale of issues: https://lightningconductor.net/grafana/d/CZQF3HzWz/lightning-stats?orgId=1&from=now-7d&to=now&fullscreen&panelId=22 Is for node: 03c436af41160a355fc1ed230a64f6a64bcbd2ae50f12171d1318f9782602be601@mainnet.lightningconductor.net:9735

n1bor avatar May 01 '19 07:05 n1bor

What implementation and version is that node running? I see about an order of magnitude less traffic on my node personally.

On Wed, May 1, 2019, 12:30 AM n1bor [email protected] wrote:

Just some stats on scale of issues:

https://lightningconductor.net/grafana/d/CZQF3HzWz/lightning-stats?orgId=1&from=now-7d&to=now&fullscreen&panelId=22 Is for node:

03c436af41160a355fc1ed230a64f6a64bcbd2ae50f12171d1318f9782602be601@mainnet.lightningconductor.net :9735

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lightningnetwork/lightning-rfc/pull/584#issuecomment-488226920, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHTWLWGAJMV26MS5GOBA23PTFBILANCNFSM4G24CIYA .

Roasbeef avatar May 01 '19 08:05 Roasbeef

A 2 week old version of Eclair. https://github.com/ACINQ/eclair/commit/9032da5326a46a295c6581ef0ce6e3d84087da60 They did commit an optimisation a week ago. Will move onto that in a few day. I think my stats are good as I am just scraping the log file with logstat. And I did double check a few of the peeks and they were real. Am average 5.5k updates a minute... That grafana site has number of peers - is about 180. https://lightningconductor.net/grafana/d/CZQF3HzWz/lightning-stats?fullscreen&panelId=14

n1bor avatar May 01 '19 18:05 n1bor

Those are orthogonal things: this PR reduces the bandwidth used by gossip, all things remaining equal (even if the number of channel_updates remained the same).

@n1bor seems to have a pathological case of flapping channels, which caused eclair to generate a constant stream of channel_updates. That's an implementation problem, not a spec one, and our recent fix (https://github.com/ACINQ/eclair/commit/fb84dfb855ddc1ee0a42897351ac5506a28198da) alleviates this issue.

pm47 avatar May 01 '19 19:05 pm47

Updated to HEAD de5a782 - lets see if it settles down in the next few hours.

UPDATE--- Just to be clear the 5k updates per minute are the total number of updates (mainly relays) sent by my node. It has about 180 peers so is about 30 updates sent to each peer each minute.

So this change would reduce size of each update from about 140bytes to about 13 - and these are both sent and received. But probably as importantly would reduce the huge number of small network packets we currently send/receive.

n1bor avatar May 01 '19 21:05 n1bor

OK, I'm working on TLV parsing, so some cleanups (and introduce some conventions). Nothing significant changed in the format, except removal of some redundant lengths inside TLV records.

rustyrussell avatar Jun 14 '19 06:06 rustyrussell

This needs a rebase. If we're gonna add more stuff to the route-sync state machine, can we improve it? The current design is super complicated, and its just kinda grown organically without a lot of thought to the end state. It has grown based on whats available without a lot of care to whats allowed (eg it relies heavily on sequence id fields being timestamps, which the spec doesn't even require, not to mention stale updates my propagate late and uses zlib as a cheap way to communicate data, instead of a purpose-designed thing like minisketch). At various points, I've suggested we move towards where Bitcoin Core is going and:

  • Start with an initial sync based on last timestamp fields with a two hour buffer (we need to constrain the timestamp fields to be at least after block header timestamps, but those are only +/- 2 hours, so we can't assume more than that).
  • Use minisketch (or just an inv list, if we don't want to take the dependency) to figure out what you need that was missed maybe because it wasn't relayed in a timely fashion or because its somewhere in the two hour buffer,
  • Use invs thereafter.

This would remove a ton of existing complexity.

TheBlueMatt avatar Mar 06 '20 01:03 TheBlueMatt