implement rate-limiting in `puller` package
⚠️ Requests for support in an issue-format will be closed immediately. For support questions, we welcome you to our Discord.
Task
It has been observed that excessive puller activity generates a lot of IO and CPU usage, we should therefore establish:
- amount of IO invoked by every operation
- CPU throughput on the recovery operations
- how many puller operations per second cause CPU to show bottlenecks
After establishing those numbers, we could proceed to limiting the amount of operations we do in puller
Acceptance criterea
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 5 days.
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 5 days.
This issue gets really bad when something causes the node's depth to decrease substantially in a short time, for instance if IPv4 connectivity is lost but IPv6 connectivity remains.
IPv4 connectivity is lost but IPv6 connectivity remains.
huh?
IPv4 connectivity is lost but IPv6 connectivity remains.
huh?
This happened to a VPS that I'm running. The provider thought my ipfs node was attacking the local IPv4 network and blocked that access, but the IPv6 address of the machine continued running without issue. The bee node on that VPS dropped it's depth from 9 to 6 because it could now only talk to IPv6-capable peers. Once we managed to get the IPv4 block removed and restarted the bee node, it went back to depth 8 or 9 (although today both of the bee nodes on that machine are running at depth 7 for some reason, probably needing a restart to learn about new peers).
I should also mention that in my observations, when the depth of a node drops significantly, the puller gets over-activate thinking that it needs to pull a lot more chunks than it really needs to maintain locally. Hence my "issue gets really bad" comment calling out one way that I've seen the depth drop quickly causing excessive puller activity.
I'm not sure if there's anything we could do about this... Not really sure how to address this specific edge case. The even bigger problem is that IPv6 connectivity is not necessarily available on all providers and there's still a lot of 4 <-> 6 translation going on AFAIK. IMO if you can only have connectivity to IPv6 then the kademlia depth is correct and so is syncing. The only thing I could imagine here is to introduce libp2p relays (which we would need to evaluate whether and how they fit into our network topology design), but this would be a huge change we need to evaluate.