bee icon indicating copy to clipboard operation
bee copied to clipboard

feat: soc dispersed replica

Open nugaon opened this issue 11 months ago • 0 comments

Checklist

  • [ ] I have read the coding guide.
  • [ ] My change requires a documentation update, and I have done it.
  • [ ] I have added tests to cover my changes.
  • [ ] I have filled out the description and linked the related issues.

Description

Add support for dispersed replicas on Single Owner Chunks (SOCs) to improve data availability and retrieval reliability in the Swarm network.

SOC replicas are generated by allowing additional addresses to represent the same SOC. It is achieved by lighten the validation of SOCs which ignores the first byte of the address. This makes possible to saturate dispersed replicas evenly across the whole network since nodes arrange into neighborhoods based on address prefix. The addresses created in a way to iterate over all variations in the given depth of the redundancy level + 1 (e.g. level is 2, and the original address starts with 101, then it uploads SOCs with addresses same after the 3 first bits where the first 3 bit variations are 001, 011, (101 is not because the original address has it), 111 and 100 (flipping the last bit).

Open API Spec Version Changes (if applicable)

feed and soc replica PUT endpoints swarm-redundancy-level header: create and push dispersed replicas according to the passed level: MEDIUM 2, STRONG 4, INSANE 8, PARANOID 16.

feed and soc replica GET endpoints swarm-redundancy-level header: for calibrating how deeply dispersed replicas should be checked. By default it is zero. Redundancy level affects lookup time since replicas tried to be fetched in batches (300 ms timeout for each batch, on highest level redundancy it is 1200 ms)

Motivation and Context (Optional)

  • Current SOC implementation has a single point of failure - if nodes storing a chunk become unavailable, the content is inaccessible
  • Poorly saturated neighborhood outages can lead to temporary or permanent content loss
  • Critical data requires higher guarantees of availability than a single storage location can provide

Related Issue (Optional)

Screenshots (if appropriate):

Drawbacks

More complex validation and retrieval logic which come with processing overhead Retrieval time with redundancy may be worse than simple feed lookups -> test required.

nugaon avatar Mar 20 '25 14:03 nugaon