grin icon indicating copy to clipboard operation
grin copied to clipboard

Loosing peers and troubles syncing.

Open johanneshahn opened this issue 4 years ago • 8 comments

We found a bug in code where peers are banned for no reason. This happens if you start syncing from zero or had a longer pause. After a while peers get banned for no reason.

The error is in header_sync.rs -> *stalling_ts Which does not recognize by peer and ban any peer for no reason. You can track this error with a small debug info in code.

let test = stalling_ts + Duration::seconds(120);
if let Some(ref peer) = self.syncing_peer {
	debug!(
		"sync_state: in ban loop {}, now {}, stuck max {}",
		peer.info.addr,
		now,
		test
	);
}

... stalling_ts never updated correctly

We track this issue for a while on epic cash too.

Here is the bugfix -> https://gitlab.com/epiccash/epic/-/merge_requests/78/diffs#diff-content-725373f81536aa7f86f634ae14cfcb9b7878e709

greetz from epic dev Team.

johanneshahn avatar Apr 10 '20 08:04 johanneshahn

Hi thanks for bringing this to our attention. We will investigate.

stalling_ts never updated correctly

Is this actually true though?

We (at least in theory) reset stalling_ts after receiving all the expected headers (can't vouch for this actually working as intended though) -

if all_headers_received {
	// reset the stalling start time if syncing goes well
	self.stalling_ts = None;
} else {

antiochp avatar Apr 10 '20 16:04 antiochp

ok lets explain why code does not work.

to get into the ban loop the peer is 2 seconds behind height. This happens very often. 2 Seconds are to low.

if header_head.height > latest_height {
  self.prev_header_sync =
  (now + Duration::seconds(**2**), header_head.height, prev_height);

}

... so after this happens peer go into if statement and the stalling time is set once and never reset only if we receive all headers from this peer and stalling is false.

if force_sync || all_headers_received || stalling {

   if stalling {
	if self.stalling_ts.is_none() { <-- stalling could never be update by this line.
		self.stalling_ts = Some(now);
	}
  } else {
	self.stalling_ts = None;
  }


...
} 

So we go many times into the if statement because the stalling time of 2 seconds is very low. Next it happens that we take longer than 120 seconds to receive all headers by this peer which bans the peer.

                           if all_headers_received {
				....
			} else {
-> here we go
				if let Some(ref stalling_ts) = self.stalling_ts {
                                      ... 
                                      if now > *stalling_ts + Duration::seconds(120)
                                             -> peer is banned if 120 seconds are gone

so in the end you can say if the peer does not send you all 512 headers (MAX_BLOCK_HEADERS) within 120 seconds then peer is banned.

johanneshahn avatar Apr 11 '20 16:04 johanneshahn

512 headers should not take 120 seconds. That's outrageous. We're only talking about 200KB

DavidBurkett avatar Apr 12 '20 00:04 DavidBurkett

yes i also removed this line too Inside stalling if statement.

self.syncing_peer = None;

johanneshahn avatar Apr 12 '20 06:04 johanneshahn

You seem to be misunderstanding this important DoS prevention feature. It would be wise to take this conversation offline. Feel free to message me on Telegram.

DavidBurkett avatar Apr 12 '20 07:04 DavidBurkett

Yes then i don't understand what banning peers randomly prevents dos attacks. Here is my address. you can send me further info how this works. https://t.me/poolboy51

johanneshahn avatar Apr 12 '20 08:04 johanneshahn

and the stalling time is set once and never reset only if we receive all headers from this peer and stalling is false.

Ah ok that makes sense I think. So basically its only reset in the "happy path" and never reset if things are delayed (for any reason).

Let me take another look over the code here.

And based on the discussion above, I'm not convinced your fix is necessarily the best way to go about this.

antiochp avatar Apr 12 '20 10:04 antiochp

Somehow related to it https://github.com/mimblewimble/grin/pull/3306 however my issue is opposite - we don't ban unresponsive peers.

hashmap avatar Apr 21 '20 20:04 hashmap