bee icon indicating copy to clipboard operation
bee copied to clipboard

Add option to skip traversal with stewardship endpoint

Open agazso opened this issue 2 years ago • 4 comments

Summary

It would be good if the /stewardship GET endpoint could have an optional parameter so that the traversal of the data is skipped, therefore it would be possible to check only a single chunk availability.

More context in #3205

Motivation

I wanted to write a tool that can check if the individual chunks of a dataset are available on the network and wanted to use the /stewardship GET endpoint for that. However it turned out that it has additional logic in it and it recognizes root chunks and immediate chunks or manifest root chunks, and then traverses all the chunks that belong to the data set. That way the checks can become very expensive and requires additional logic on the user's side to differentiate between different chunks.

Implementation

There could be an optional query parameter (e.g. traverse=false or skipTraversal or something like that) when specified then would skip the traversal logic and would just simply try to fetch the given chunk from the network.

I created an example implementation that does this in the https://github.com/ethersphere/bee/tree/feat/stewardship-skip-traversal branch, but I understand that it is not production quality, so I don't expect it to be merged.

agazso avatar Sep 02 '22 10:09 agazso

Actually, there are 3 different use cases for the /stewardship API, both GET and PUT.

  1. Current operation which traverses an entire manifest if the reference "smells" like one, and also traverses all of the chunks of a non-manifest /bytes reference (BMT joiner). Really only useful for small manifests or files.
  2. An option that only does the full /bytes reference (BMT joiner), but does NOT traverse the manifest. Useful for clients that do their own explicit mantaray manifest processing. (https://github.com/ethersphere/mantaray-js)
  3. The option described above which only checks the exact specified chunk. Useful for clients that do their own BMT processing (https://github.com/fairDataSociety/bmt-js)

Both myself and @mfw78 are doing 2 with our large manifests on the swarm.

ldeffenb avatar Sep 02 '22 11:09 ldeffenb

Since it is actually possible to retrieve single chunks using the /stewardship endpoint, we will for now close this issue.

istae avatar May 24 '23 11:05 istae

I disagree. If you hit the /stewardship endpoint with a chunk address that happens to be the root reference of a mantaray manifest, it will traverse and process the ENTIRE manifest. Unless I'm missing something in the API that constrains it to a single chunk?

ldeffenb avatar May 24 '23 12:05 ldeffenb

I see the point now. No, we do not have a query for this yet. It should be trivial to add though.

istae avatar May 24 '23 23:05 istae