bee icon indicating copy to clipboard operation
bee copied to clipboard

/bytes redundancy headers ignored

Open ldeffenb opened this issue 1 year ago • 2 comments

Context

2.1.0-rc2 in sepolia testnet

Summary

When doing a single-chunk /bytes retrieval, it seems that the redundancy headers are being ignored and multiple chunks at different references are being fetched and cancelled anyway.

Expected behavior

If I specify Swarm-Redundancy-Strategy: 0 and Swarm-Redundancy-Fallback-Mode: false, I would expect only the referenced chunk to be retrieved.

Actual behavior

For every request, multiple chunks are being fetched and cancelled per the logs below. I have confirmed that the headers are visible in the api based on the first log in my hacked version. https://github.com/ldeffenb/bee/blob/2ebe25af0c8238a37f9526584f31b0bd00b6296a/pkg/api/bzz.go#L538

This is without the redundancy headers:
curl http://192.168.10.38:1633/bytes/6f6aae5e8bda728b1d50594204376c53d73453b8d195ab7b602a4df8ccaa2ecd --output temp.tmp
"time"="2024-05-24 14:36:23.851095" "level"="debug" "logger"="node/api" "msg"="downloadHandler" "reference"="6f6aae5e8bda728b1d50594204376c53d73453b8d195ab7b602a4df8ccaa2ecd" "Strategy"=null "Fallback"=null
"time"="2024-05-24 14:36:23.856788" "level"="debug" "logger"="node/retrieval" "msg"="retrieval failed" "chunk_address"="ab2cfc3671935481246dd99f7e3829c0ef678d69171e6816ef50b94b5a16d011" "error"="context canceled"
"time"="2024-05-24 14:36:23.856969" "level"="debug" "logger"="node/retrieval" "msg"="retrieval failed" "chunk_address"="7bb8257947db81bd1722918a7bbd3c1ebd8483cc561a0e2c7521c627a11c7ecc" "error"="context canceled"

The same happens with the headers supplied (note the downloadHandler sees the values):
curl -H "Swarm-Redundancy-Strategy: 0" -H "Swarm-Redundancy-Fallback-Mode: false" http://192.168.10.38:1633/bytes/6f6aae5e8bda728b1d50594204376c53d73453b8d195ab7b602a4df8ccaa2ecd --output temp.tmp
"time"="2024-05-24 14:39:39.053943" "level"="debug" "logger"="node/api" "msg"="downloadHandler" "reference"="6f6aae5e8bda728b1d50594204376c53d73453b8d195ab7b602a4df8ccaa2ecd" "Strategy"=0 "Fallback"=false
"time"="2024-05-24 14:39:39.054327" "level"="debug" "logger"="node/retrieval" "msg"="retrieval failed" "chunk_address"="ab2cfc3671935481246dd99f7e3829c0ef678d69171e6816ef50b94b5a16d011" "error"="context canceled"
"time"="2024-05-24 14:39:39.054532" "level"="debug" "logger"="node/retrieval" "msg"="retrieval failed" "chunk_address"="7bb8257947db81bd1722918a7bbd3c1ebd8483cc561a0e2c7521c627a11c7ecc" "error"="context canceled"

Steps to reproduce

Find a single chunk reference (like 6f6aae5e8bda728b1d50594204376c53d73453b8d195ab7b602a4df8ccaa2ecd in sepolia testnet) and do a /bytes retrieval on it. Watch your --verbosity 5 (Debug) logs for other chunk_address references failing with context canceled.

Possible solution

Not sure yet. I haven't figured out how to enable the redundancy logger and haven't had time to provide a known-enabled logger to see what's going on in there. https://github.com/ethersphere/bee/blob/8645aca8600408f9d9066460b756bc1c7e8b5c81/pkg/file/redundancy/getter/getter.go#L68

ldeffenb avatar May 24 '24 14:05 ldeffenb

And this is what it does to the retrieval metrics. Yes, I'm asking for a lot of chunks (walking the OSM manifest reference), but there shouldn't be any failures as all of the chunks are pinned locally. (Which may be a separate problem depending on where the metric is incremented. This sure looks like it's going directly to the swarm and not checking the local pinned store) image

ldeffenb avatar May 24 '24 14:05 ldeffenb

Late breaking news: Apparently the redundancy level isn't getting to the file joiner intact. Added the following log: https://github.com/ldeffenb/bee/blob/b50dc2780587e8a9c76bcfe9fcb0656dd8c37142/pkg/file/joiner/joiner.go#L112 And now get:

"time"="2024-05-24 15:37:46.922256" "level"="debug" "logger"="node/api" "msg"="downloadHandler" "reference"="6f6aae5e8bda728b1d50594204376c53d73453b8d195ab7b602a4df8ccaa2ecd" "Strategy"=0 "Fallback"=false
"time"="2024-05-24 15:37:46.922271" "level"="debug" "logger"="node/api" "msg"="joiner.New" "address"="6f6aae5e8bda728b1d50594204376c53d73453b8d195ab7b602a4df8ccaa2ecd" "rLevel"=4
"time"="2024-05-24 15:37:46.922370" "level"="debug" "logger"="node/api" "msg"="replicas.getter.Get: so<-next" "addr"=[123,184,37,121,71,219,129,189,23,34,145,138,123,189,60,30,189,132,131,204,86,26,14,44,117,33,198,39,161,28,126,204]
"time"="2024-05-24 15:37:46.922404" "level"="debug" "logger"="node/api" "msg"="replicas.getter.Get: so<-next" "addr"=[171,44,252,54,113,147,84,129,36,109,217,159,126,56,41,192,239,103,141,105,23,30,104,22,239,80,185,75,90,22,208,17]
"time"="2024-05-24 15:37:46.922442" "level"="debug" "logger"="node/retrieval" "msg"="retrieval failed" "chunk_address"="7bb8257947db81bd1722918a7bbd3c1ebd8483cc561a0e2c7521c627a11c7ecc" "error"="context canceled"
"time"="2024-05-24 15:37:46.922456" "level"="debug" "logger"="node/api" "msg"="replicas.getter.Get: so<-next" "addr"=[123,184,37,121,71,219,129,189,23,34,145,138,123,189,60,30,189,132,131,204,86,26,14,44,117,33,198,39,161,28,126,204] "err"="context canceled"
"time"="2024-05-24 15:37:46.922472" "level"="debug" "logger"="node/retrieval" "msg"="retrieval failed" "chunk_address"="ab2cfc3671935481246dd99f7e3829c0ef678d69171e6816ef50b94b5a16d011" "error"="context canceled"
"time"="2024-05-24 15:37:46.922485" "level"="debug" "logger"="node/api" "msg"="replicas.getter.Get: so<-next" "addr"=[171,44,252,54,113,147,84,129,36,109,217,159,126,56,41,192,239,103,141,105,23,30,104,22,239,80,185,75,90,22,208,17] "err"="context canceled"

Note that the joiner sees an rLevel of 4! It should be 0. The other logs are coming from: https://github.com/ldeffenb/bee/blob/b50dc2780587e8a9c76bcfe9fcb0656dd8c37142/pkg/replicas/getter.go#L125 and https://github.com/ldeffenb/bee/blob/b50dc2780587e8a9c76bcfe9fcb0656dd8c37142/pkg/replicas/getter.go#L131

ldeffenb avatar May 24 '24 15:05 ldeffenb

@ldeffenb https://github.com/ethersphere/bee/pull/4711 closes this isssue, right?

istae avatar Jul 16 '24 19:07 istae

@ldeffenb #4711 closes this isssue, right?

Yes, it should. Marking this closed pending my testing once I merge the latest changes and back out my temporary hack.

ldeffenb avatar Jul 16 '24 22:07 ldeffenb