A timeout is required when fetching blocks
GetBlocks requires a fetch timeout for each block.
Below, we simulated adverse conditions by connecting 10 clients to one server and applying a speed limit.
2025/04/09 17:24:44 PROGRESS: [6.48 6.52 0.36 5.92 6.73 6.07 6.73 6.07 3.15 1.06]
2025/04/09 17:24:45 PROGRESS: [6.48 6.73 0.36 5.92 6.77 6.11 6.73 6.27 3.15 1.06]
2025/04/09 17:24:46 PROGRESS: [6.48 6.92 0.36 6.07 6.96 6.15 6.73 6.46 3.15 1.06]
2025/04/09 17:24:47 PROGRESS: [6.5 6.94 0.36 6.27 6.96 6.27 6.73 6.48 3.15 1.06]
2025/04/09 17:24:48 PROGRESS: [6.69 6.94 0.36 6.3 6.96 6.46 6.92 6.48 3.15 1.06]
There are clients (0.36, 3.15, 1.06) that are stuck and unable to download.
When executing GetBlocks, if it takes too long to fetch a specific block, it will not be cancelled and will just hang. To improve this situation, should stop and find another peer.
boxo version : v0.29.1
Specifically, the problem occurred when there was one server (bootstrap node) that held the files and hundreds of clients tried to download the files simultaneously. Some clients will successfully download, but most will get stuck and not be able to download. I expected the node that received it first would forward the block to other nodes, but that didn't happen. periodicSearchDelay is also useless if block reception has already started.
triage notes
- @gammazero need to lean on you being the most familiar what are the next steps here
Investigating: This my just be a documentation issue about how to set a timeout.
Triage notes:
- point to existing example if any or add one
I temporarily solved this by adding a timer that is initialized when a channel is received in GetBlocks and a retry routine after context cancel.
I will try to make a sample code soon if I have time. But my way is not so good. It seems like it would be better to modify something in bitswap, but I haven't gotten that far.
@jclab-joseph any reason why you can't pass a context.WithTtimeout? This is idiomatic way of making tasks time-bound, no?
https://github.com/ipfs/boxo/blob/a19e342de9c63fcf55eee628ed498b64eeefc6cd/bitswap/bitswap.go#L28-L29
@lidel This method cancels all blocks. The issue is this situation:
- Block found from Peer-A.
- Block is requested from Peer-A.
- But there is no response from Peer-A.
So as a workaround, I passed a cancelable context to GetBlocks, canceled it when the timeout occurred, and then GetBlocks the remaining Blocks again.
Oops, seems like we needed more information for this issue, please comment with more details or this issue will be closed in 7 days.
Oops, seems like we needed more information for this issue, please comment with more details or this issue will be closed in 7 days.