Add max_no_channel_peers to UserConfig
The MAX_NO_CHANNEL_PEERS constant provides helpful protection from an excessive number of peer connections, but as a constant, isn't user modifiable. The desired number of non-channel peers may vary depending on the application, so here we make it configurable.
How many do y'all want? I'm of half a mind to just 10x it and call it a day.
Today, 2500 -- tomorrow, THE WORLD!
We currently run a fork with 2500. This is presently plenty of headroom, but I might want to tweak it down...
right. depending on the node's purpose and planned usage, a node operator might want to configure this to be higher or lower. Given how this impacts resource usage, it makes sense for this to be a tunable parameter.
Given how this impacts resource usage, it makes sense for this to be a tunable parameter.
I'm not entirely sure that it does impact resource usage that much. The maximum number of pre-funded channels definitely does, but no-channel peers are incredibly cheap (in ChannelManager). They may cost bandwidth for gossip, but IMO that is best dealt with at the gossip layer, not in the peer count limit.
no-channel peers are incredibly cheap
Well, would no limit at all be better?
The options I can think of are (1) no limit, (2) a one-size-fits-all limit, or (3) a customizable limit with a sensible default.
Well, would no limit at all be better?
I mean, there is some cost, just not a hell of a lot. Not sure that actually no limit is the right answer given there is some non-zero cost, even if its tiny.
Maybe leave the current limit as default, make it configurable, and then just add to the doc comment there is a low cost to no-channel peers, but may lead to a higher cost at gossip/bandwidth and use responsibly etc.
Maybe leave the current limit as default, make it configurable, and then just add to the doc comment there is a low cost to no-channel peers, but may lead to a higher cost at gossip/bandwidth and use responsibly etc.
SGTM, of course. I think this PR will get you that, minus the doc comment.
Hey, I wanted to bump this. I'm happy to rebase it or whatever is necessary. I think the bigger blocker is consensus that this should be configurable.
The main thing we don't seem to know/agree on above is the "cost" of a peer. I'd love to bring some data on the cost of a peer so that we can decide on a number, but it's not fixed. Two things can vary:
- How much your typical peer "costs."
- For example, last year, our node had an issue with a large population of peers that were sending a from-zero gossip sync every time they connected. We got that to change, and reduced peer-handling resource costs by 99%.
- The latter is a bug, of course, but any kind of designed behavior could emerge in a Lightning client on either side of peer connection (ourselves, or a peer) that makes this cost increase/decrease.
- What you are able to "pay."
- Fairly obvious here, but enterprisey routing nodes or custodial wallets have far greater resource access and can therefore support forwarding gossip or answering pings for a much larger population of non-counterparty peers than say, someone bootstrapping without a lot of financial support. For the latter, a fixed amount of 2500 may be far too high and lead to resource exhaustion.
Okay, finally coming back to this post-0.2, sorry about that.
I'm still not really convinced it makes sense to add a config knob here. IMO config knobs are not free - we generally expect downstream devs to consider the knobs that exist and set them correctly, with sufficient guidance that we hope its somewhat straightforward to do so. For something like max peers, we (LDK) probably have more knowledge about the "cost" of peers than our average downstream dev, implying we should be the one setting it, not them (though that's certainly not universally the case, and I'm sure y'all specifically know way more than we do in this instance!).
You raise specific concerns over bugs which had to have peer count limited, but I don't really think a config knob on peer limits is the right answer there. Rather, in this specific instance the right answer is probably either a custom message handler that limits peers with specific feature flags or an intercepting message handler on gossip messages that disconnects peers when they're flooding us with gossip. In the general case, I'd hope emergency deployments which attempt to fix DoS issues would do something similar to one of those - temporarily targeting the specific peers causing load (until we can fix the DoS risk) rather than something as broad as a total peer count limit (though that's also straightforward to implement in a custom message handler, if really needed).
As for different node deployments, I'm not sure its as relevant here as you're thinking - basically the only nodes that should be accepting incoming connections are routing nodes that should be very comfortable with 2000+ connections. Maybe there's a difference in that an average routing node would only be comfortable with 1000 connections but a more enterprisey routing node is comfortable with 2500 connections, but is that really worth a config flag? If we just default the config to 1000 would anyone ever actually change it?
I am still worried we're addressing an issue in the wrong place. The reason we have connection limiting in ChannelManager is to avoid disconnecting peers we have channels with while still limiting the connection count to something less than our maximum file descriptors. Maybe we should make that more explicit in channelmanager.rs and actually fetch the file descriptor limit and limit peers to that / 2. We should likely independently add a limit for the number of peers we'll accept gossip from, disconnecting peers that spam us with crap when we didn't request it.
Also happy to hop on a call and discuss this live rather than going back and forth endlessly in messages.