bee
bee copied to clipboard
Download errors on smoke tests
There are some download errors on the smoke tests, that are happening on both dev-bee-gateway & bee-gateway. They are present for a while now and they come and go. The spike for dev-bee-gateway has been at 66,8% and for bee-gateway at 86,8%
We will need to investigate why we experience these issues and understand what triggers these errors. We need to answer the following
- what triggers these issues?
- what is the percentage of the failed downloads in total?
- Are they consistent to specific node operators or all the node operators may potentially experience these issues?
- What are the possible solutions on this?
As the network grows, we should expect these issues to increase. We need to make sure we know why this happens and be proactive about this.
If you're doing single chunk verifications, it would be good to track which neighborhoods are failing. This doesn't work so well on BMT (/bytes) or Mantaray (/bzz) downloads as the reference is only the tip of the iceberg and it can be any chunk below the main reference that is actually having the error.
Also, I've got a bunch of pinners running for older versions of the OSM map tile sets that lost their pins in the early days of the localstore. Many of these chunks are actually no longer available in the swarm, so your nodes may be seeing retrieval failures in their metrics that are relayed from my attempts. Of course, if your smoke test is doing it's own download failure detection, then it wouldn't be my relays, but if you're only looking at metrics, I suspect a large part of the swarm is reflecting my remote retrieval failures.
We are going to include more logging about download errors on ph4 release.
Also we are going to add a new smoke test where we test for broken neighborhoods by uploading and dowloading chunks mined for different neighborhoods.
Observations: https://hackmd.io/BIE--q9-QnepG24S0FvVwg