Tracking: security: Limit ability of synthetic nodes to take up connection slots. Credit: Ziggurat Team
Motivation
"RT-S1 f3" from "Red Team report on Zcash testnet"
Once a connection is established with a synthetic node, a connection can remain active forever. That could easily lead to denial of service when all connection slots would be occupied by synthetic nodes and rightful users wouldn’t have access to the node. The problem arises when we take into account that synthetic nodes can do almost nothing so it’s cheap (that’s why it can increase the risk of exploitation to medium) to set up thousands of nodes just to connect to many nodes and occupy the connection slots.
Also "Observation #5" from "Red Team report on Zcash testnet":
Synthetic nodes can keep the connection alive without doing any actual blockchain work forever. Sometimes a synthetic node does get disconnected, but a quick reconnect solves the issue.
Specifications
No response
Complex Code or Requirements
There are two scenarios here:
- no load: peers that do nothing, provide no useful information, and aren't syncing from us should get disconnected
- overload: peers that block readiness by constantly sending inbound requests should get disconnected (follow up to #7772, existing overload defence is global, the only peer-specific defence is the heartbeat)
Potential Fixes
Before starting work, discuss the impact of each potential fix on this vulnerability with the team, and decide on 1-3 fixes that have the greatest impact:
- [ ] Increase the minimum number of connected peers to the initial target peerset size, so that Zebra always has a variety of peers, and the first/newest/fastest peers don't take up all the slots (also helps with #7824) https://github.com/ZcashFoundation/zebra/blob/7e7f989545bac06ee674547425ac0ac907cc0bce/zebra-network/src/peer_set/initialize.rs#L913
- [ ] Retry failed peers only if we've successfully connected to them, don't believe
untrusted_last_seenhttps://github.com/ZcashFoundation/zebra/blob/7e7f989545bac06ee674547425ac0ac907cc0bce/zebra-network/src/meta_addr.rs#L639-L640 - [ ] Retry other failed peers (untrusted or no last seen time) if their last attempt is longer than
MAX_PEER_ACTIVE_FOR_GOSSIP(or maybe just 10+ minutes?) but less thanMAX_RECENT_PEER_AGE - [ ] #7852
- [ ] #1875
- [ ] If we're not getting any new blocks in the progress task, send a
ChangePeerConnectionsrequest to the network to disconnect some peers - [ ] Increase
GET_ADDR_FANOUTto 2 or 3, to increase the diversity of the address book - [ ] Reduce the
TimerCrawlinterval to increase the diversity of the address book - [ ] TODO: add more fixes?
Testing
No response
Related Work
No response
A good first step would be listing our existing defences against this attack, for inbound and outbound connections. The report is unclear about the difference.
@mpguerra who would you like to do that work?
Most solutions to #7824 also improve this issue for outbound connections (but don't impact inbound connections).
Hey team! Please add your planning poker estimate with Zenhub @arya2 @oxarbitrage @teor2345 @upbqdn