zebra icon indicating copy to clipboard operation
zebra copied to clipboard

Tracking: security: Limit ability of synthetic nodes to take up connection slots. Credit: Ziggurat Team

Open mpguerra opened this issue 2 years ago • 3 comments

Motivation

"RT-S1 f3" from "Red Team report on Zcash testnet"

Once a connection is established with a synthetic node, a connection can remain active forever. That could easily lead to denial of service when all connection slots would be occupied by synthetic nodes and rightful users wouldn’t have access to the node. The problem arises when we take into account that synthetic nodes can do almost nothing so it’s cheap (that’s why it can increase the risk of exploitation to medium) to set up thousands of nodes just to connect to many nodes and occupy the connection slots.

Also "Observation #5" from "Red Team report on Zcash testnet":

Synthetic nodes can keep the connection alive without doing any actual blockchain work forever. Sometimes a synthetic node does get disconnected, but a quick reconnect solves the issue.

Specifications

No response

Complex Code or Requirements

There are two scenarios here:

  1. no load: peers that do nothing, provide no useful information, and aren't syncing from us should get disconnected
  2. overload: peers that block readiness by constantly sending inbound requests should get disconnected (follow up to #7772, existing overload defence is global, the only peer-specific defence is the heartbeat)

Potential Fixes

Before starting work, discuss the impact of each potential fix on this vulnerability with the team, and decide on 1-3 fixes that have the greatest impact:

  • [ ] Increase the minimum number of connected peers to the initial target peerset size, so that Zebra always has a variety of peers, and the first/newest/fastest peers don't take up all the slots (also helps with #7824) https://github.com/ZcashFoundation/zebra/blob/7e7f989545bac06ee674547425ac0ac907cc0bce/zebra-network/src/peer_set/initialize.rs#L913
  • [ ] Retry failed peers only if we've successfully connected to them, don't believe untrusted_last_seen https://github.com/ZcashFoundation/zebra/blob/7e7f989545bac06ee674547425ac0ac907cc0bce/zebra-network/src/meta_addr.rs#L639-L640
  • [ ] Retry other failed peers (untrusted or no last seen time) if their last attempt is longer than MAX_PEER_ACTIVE_FOR_GOSSIP (or maybe just 10+ minutes?) but less than MAX_RECENT_PEER_AGE
  • [ ] #7852
  • [ ] #1875
  • [ ] If we're not getting any new blocks in the progress task, send a ChangePeerConnections request to the network to disconnect some peers
  • [ ] Increase GET_ADDR_FANOUT to 2 or 3, to increase the diversity of the address book
  • [ ] Reduce the TimerCrawl interval to increase the diversity of the address book
  • [ ] TODO: add more fixes?

Testing

No response

Related Work

No response

mpguerra avatar Oct 25 '23 14:10 mpguerra

A good first step would be listing our existing defences against this attack, for inbound and outbound connections. The report is unclear about the difference.

@mpguerra who would you like to do that work?

teor2345 avatar Oct 25 '23 19:10 teor2345

Most solutions to #7824 also improve this issue for outbound connections (but don't impact inbound connections).

teor2345 avatar Oct 25 '23 22:10 teor2345

Hey team! Please add your planning poker estimate with Zenhub @arya2 @oxarbitrage @teor2345 @upbqdn

mpguerra avatar Dec 01 '23 10:12 mpguerra