bee icon indicating copy to clipboard operation
bee copied to clipboard

Multiple upload shallow push

Open ldeffenb opened this issue 9 months ago • 3 comments

Context

2.1.0-rc2 and earlier

Summary

When pushing a single chunk /bytes content into a lightly populated swarm (like the current sepolia testnet), subsequent uploads of the same content log (debug) shallow receipt depth and retry the chunk. If you continue to re-push the same content over and over, eventually the logs stop, but in fact, the node decides to store it locally even though it is not in the target neighborhood.

Expected behavior

I would expect multiple uploads of the same content to deliver to the same successful path into the swarm.

Actual behavior

The following type logs are generated. If you have additional logging implemented, you will see that the node is actually trying different "closest" peers for each upload that is initiated.

"time"="2024-05-19 11:19:05.094188" "level"="error" "logger"="node/pusher" "msg"="pusher: failed checking receipt" "error"="pusher: shallow receipt depth 2, want at least 4, chunk_address d50c26a504619838ff1fd1cc86ed69b9b6b380f484b3a0444015df500322f4c6: shallow recipt" "chunk_address"="d50c26a504619838ff1fd1cc86ed69b9b6b380f484b3a0444015df500322f4c6"
"time"="2024-05-19 11:19:05.094206" "level"="debug" "logger"="node/storer" "msg"="direct upload: shallow receipt received, retrying" "chunk"="d50c26a504619838ff1fd1cc86ed69b9b6b380f484b3a0444015df500322f4c6"

Steps to reproduce

IN THE CURRENT LIGHTLY POPULATED SEPOLIA TESTNET d50c26a504619838ff1fd1cc86ed69b9b6b380f484b3a0444015df500322f4c6.zip Unzip and upload the d50c26a504619838ff1fd1cc86ed69b9b6b380f484b3a0444015df500322f4c6.bytes file from within the .zip file using the following command (adjust the URL if necessary)

curl -X POST -H "swarm-deferred-upload: false" -H "swarm-postage-batch-id: {yourStampID}" --data-binary @d50c26a504619838ff1fd1cc86ed69b9b6b380f484b3a0444015df500322f4c6.bytes http://localhost:1633/bytes

You should receive the same reference: {"reference":"d50c26a504619838ff1fd1cc86ed69b9b6b380f484b3a0444015df500322f4c6"}

The content is actually a mantaray manifest value node pointing to the following png file. You can see this with:

swarm-cli manifest list --bee-api-url http://localhost:1633 d50c26a504619838ff1fd1cc86ed69b9b6b380f484b3a0444015df500322f4c6

which should show you 61ef14a6d58f33f2b57d5f5d7f2e06b1c5dc0d8db652e405a78c99d8902681ce / 61ef14a6d58f33f2b57d5f5d7f2e06b1c5dc0d8db652e405a78c99d8902681ce

Possible solution

The issue, I believe, is the skiplist in pushsync.go. As the first push is happening, the closest peer/chunk is added to the Skiplist for 5 MINUTES. On the second upload, the closest peer is skipped causing the push to route to the next closest peer. That peer also then adds the closest peer to it's skiplist for the same duration. As each upload is repeated, the push takes a more and more circuitous route to the actual closest peer in the swarm. This is exacerbated in the current sepolia testnet, but has equally inefficient effects in a more robust swarm like the mainnet.

https://github.com/ldeffenb/bee/blob/5ebc9b342f0a86532865a34769a09a67c1421a6c/pkg/pushsync/pushsync.go#L433 https://github.com/ldeffenb/bee/blob/5ebc9b342f0a86532865a34769a09a67c1421a6c/pkg/pushsync/pushsync.go#L48

Somehow, on a successful, not shallow receipt, push, the route taken (chunk/peer) for that success should be removed from the skiplist so that it can be re-used on the next upload. However, it is only the pusher and/or netstore's DirectUpload that know if the peer used for the upload was good or if it explicitly wants to retry on a different peer.

Note this is also related to #4680 .

ldeffenb avatar May 19 '24 15:05 ldeffenb