grin
grin copied to clipboard
Loosing peers and troubles syncing.
We found a bug in code where peers are banned for no reason. This happens if you start syncing from zero or had a longer pause. After a while peers get banned for no reason.
The error is in header_sync.rs -> *stalling_ts Which does not recognize by peer and ban any peer for no reason. You can track this error with a small debug info in code.
let test = stalling_ts + Duration::seconds(120);
if let Some(ref peer) = self.syncing_peer {
debug!(
"sync_state: in ban loop {}, now {}, stuck max {}",
peer.info.addr,
now,
test
);
}
... stalling_ts never updated correctly
We track this issue for a while on epic cash too.
Here is the bugfix -> https://gitlab.com/epiccash/epic/-/merge_requests/78/diffs#diff-content-725373f81536aa7f86f634ae14cfcb9b7878e709
greetz from epic dev Team.
Hi thanks for bringing this to our attention. We will investigate.
stalling_ts never updated correctly
Is this actually true though?
We (at least in theory) reset stalling_ts
after receiving all the expected headers (can't vouch for this actually working as intended though) -
if all_headers_received {
// reset the stalling start time if syncing goes well
self.stalling_ts = None;
} else {
ok lets explain why code does not work.
to get into the ban loop the peer is 2 seconds behind height. This happens very often. 2 Seconds are to low.
if header_head.height > latest_height {
self.prev_header_sync =
(now + Duration::seconds(**2**), header_head.height, prev_height);
}
... so after this happens peer go into if statement and the stalling time is set once and never reset only if we receive all headers from this peer and stalling is false.
if force_sync || all_headers_received || stalling {
if stalling {
if self.stalling_ts.is_none() { <-- stalling could never be update by this line.
self.stalling_ts = Some(now);
}
} else {
self.stalling_ts = None;
}
...
}
So we go many times into the if statement because the stalling time of 2 seconds is very low. Next it happens that we take longer than 120 seconds to receive all headers by this peer which bans the peer.
if all_headers_received {
....
} else {
-> here we go
if let Some(ref stalling_ts) = self.stalling_ts {
...
if now > *stalling_ts + Duration::seconds(120)
-> peer is banned if 120 seconds are gone
so in the end you can say if the peer does not send you all 512 headers (MAX_BLOCK_HEADERS) within 120 seconds then peer is banned.
512 headers should not take 120 seconds. That's outrageous. We're only talking about 200KB
yes i also removed this line too Inside stalling if statement.
self.syncing_peer = None;
You seem to be misunderstanding this important DoS prevention feature. It would be wise to take this conversation offline. Feel free to message me on Telegram.
Yes then i don't understand what banning peers randomly prevents dos attacks. Here is my address. you can send me further info how this works. https://t.me/poolboy51
and the stalling time is set once and never reset only if we receive all headers from this peer and stalling is false.
Ah ok that makes sense I think. So basically its only reset in the "happy path" and never reset if things are delayed (for any reason).
Let me take another look over the code here.
And based on the discussion above, I'm not convinced your fix is necessarily the best way to go about this.
Somehow related to it https://github.com/mimblewimble/grin/pull/3306 however my issue is opposite - we don't ban unresponsive peers.