Automatic Legacy Feed Fallback
Automatic Legacy Feed Fallback
Background
Bee supports two feed formats: legacy and new.
| Format | Contents of Single Owner Chunk | Fetch Steps | Notes |
|---|---|---|---|
| Legacy | 8-byte timestamp (uint64, big endian) + reference (32 bytes unencrypted / 64 bytes encrypted) | Fetch SOC → Read reference → Fetch payload | Indirect fetch; adds latency |
| New | Payload stored directly in the SOC | Fetch payload | Eliminates lookup; faster |
Why the change
The new feed format improves performance by removing the intermediate lookup step. By being able to fetch the payload directly, we get rid of a lookup roundtrip and achieve better latency.
In performance-critical scenarios, such as live streaming (where one feed update corresponds to one multimedia segment), the new feed format provides a significant performance boost.
Previous Feed Resolution Logic (Bee 2.6.x and maybe prior)
Older Bee versions tried to automatically detect the feed type based on the payload size:
- 40 bytes (8 + 32)
- 72 bytes (8 + 64)
If the payload matched one of these sizes, Bee assumed a legacy feed, skipped the first 8 bytes (timestamp), and treated the rest as a reference.
This heuristic fails for payloads that coincidentally match these sizes. For example, chat messages that are exactly 40 or 72 bytes, would be misinterpreted as legacy references, leading to fetch errors.
Related issue:
- https://github.com/ethersphere/bee/issues/5027
Deterministic Feed Resolution (Bee 2.7.0)
Bee 2.7.0 removes the size-based heuristic. The new feed format is now the default.
To resolve a feed as legacy, end users must explicitly add (and there is no other way):
?swarm-feed-legacy-resolve=true
Related issues:
- https://github.com/ethersphere/bee/issues/5156
- https://github.com/ethersphere/bee/issues/5157
Feature Request for Feed Manifests in /bzz/ Endpoint
While the new behavior of Bee 2.7.0 is a step in the right direction (using the new format works without edge cases), we need a graceful fallback mechanism when the legacy lookup is known to be necessary for a correct response.
| Bee Version | Feed Mode | Example URL | Result |
|---|---|---|---|
| 2.6.x | Size-based heuristic | http://localhost:1633/bzz/2fdac7bd34a7e27e46e3fd2b6c67f13e3f04231ef92226b8d64a5f693ec80cae/ |
HTTP 200 OK |
| 2.7.0 | New (default) | http://localhost:1633/bzz/2fdac7bd34a7e27e46e3fd2b6c67f13e3f04231ef92226b8d64a5f693ec80cae/ |
{"code": 404,"message": "address not found or incorrect"} |
http://localhost:1633/bzz/2fdac7bd34a7e27e46e3fd2b6c67f13e3f04231ef92226b8d64a5f693ec80cae/?swarm-feed-legacy-resolve=true |
HTTP 200 OK for root document, subpaths fail |
Proposed Solution
- Attempt to resolve feed manifests using the new feed format (status quo).
- If the result is HTTP 404, automatically retry with
?swarm-feed-legacy-resolve=true. - If the retry still returns 404, return the error to the client.
- Otherwise, return the successfully fetched content (HTTP 200).
Hi @Cafe137,
Thanks for the issue. I have a bit of feedback here.
- The automatic fallback is indeed feasible, however it requires the introduction of some hacks in the API. The business logic of the chunk indirection isn't so straight forward and it requires rinse-and-repeat over large sections of the code with different parameters. Also, having these automagic fallbacks disincentivizes users to update their feed versions (so in other words - we're introducing permanent hacks into the API). It also makes the
swarm-feed-legacy-resolveparam add more complexity rather than improve performance/correctness (if both methods are tried why do we need it in the first place?). - In point 2 you mention automatically retry with
?swarm-feed-legacy-resolve=true, so I understand you're suggesting to do a client redirect to this new URL? if so, it may break apps that are using the query parameters as a means to convey their own application specific data. I.e.https://localhost:1633/<uniswap_app_addr>/explore/tokens/ethereum/NATIVE?inputCurrency=NATIVEturns intohttps://localhost:1633/<uniswap_app_addr>/explore/tokens/ethereum/NATIVE?inputCurrency=NATIVE&swarm-feed-legacy-resolve=true. Not sure if it's pretty nor desirable (query fields should be set by the client, not injected with data by the server). - The "nicer" way to do this would be to "peek" into the two possibilities of addresses at the same time initially, and potentially race them, assuming that the "new" method always wins when the new feed is defined. The only problem with this is that it will always look in both ways (and eliminate the guesswork with the rinse-and-repeat method). For applications that need heavy usage of feeds (in volume) this might become problematic.
I don't have a very strong objection to this behavior (apart from the fact it increases technical debt), however I think that dwelling on this a while longer might be beneficial. Ideally users should be aware that "legacy" is "legacy" meaning it will die out. With this approach we just end up with two parallel versions A and B, both are automatically supported and live forever under the given names "legacy" and "new".
WDYT about this?
Hi @acud,
Thank you for the thoughtful response.
- I was thinking of handling this logic inside the high-level api package, so that the resolution and retrieval internals can be kept intact. If this is not possible, I am +1 for dwelling on this more.
- This is a valid concern and for that reason I was thinking of using internal redirections instead. That way clients send only one request, and receive 2xx instead of 3xx, with and without the fallback code path.
- I don't have a strong opinion against racing both versions rather than having a more basic try-and-fallback flow. I suggested it because it implies the primary-fallback hierarchy more clearly, while racing both suggests equally good feed formats.
For some additional context, my motivation is addressing two cases with this task. The first one is keeping historical and well-known feeds accessible, e.g. the OpenStreetMap dataset on Swarm. The second one is preserving backward compatibility for builders and uploaders who will eventually migrate to wrapped (new) feeds, but are not aware of it yet or haven't had the time to do so. The first one is problematic as we may never see a reupload for some datasets. The second one can be solved more simply by communicating the feed change and giving a transition period to the new format.
This effort needs to be looked at as a product problem and a communication task too - not just a technical design. We also need to keep in mind that we should not be introducing breaking changes lightly anymore, as Bee is a more mature product now with multiple entities depending on it. This is unfortunately not the case with Bee 2.7 RC where feed incompatibility issues are showing.
Essentially, the goal is having temporary backward compatibility while giving users enough time and information for a smooth transition.
Thanks @Cafe137 for the feedback. Continuing the conversation: To sum up 1+2+3: The problem with doing internal redirections is that it would lead to a lousy UX. The fact that you need to wait for a not-found error in order to just try the other version is sub-optimal. It means that it would just add unnecessary latency to the API. From my perspective, true "backwards-compatibility" is transparent and should necessarily mean that the support for both formats shouldn't have any implied additional performance impact (apart from, perhaps, the one which is related to the actual heuristic used to do the retrievals). So, from my perspective, having some well-defined racing algo to do the retrieval of both feeds (whenever necessary) would have a better UX.
Also, additional takeaways:
- rename "legacy" to
v1and "new" tov2as both aren't going anywhere anytime soon - start introducing versioning bits to data structures. it's never too late. this could save a lot of these endless discussions, and very much simplify supporting older formats both in the API and the procotols (instead of a lot of guesswork).
- totally agree on the communication and product aspects. perhaps, for historical datasets that cannot expect a re-upload, we should consider providing some new of our own "verified" layer of SOCs v2 signed by the foundation (with a proof of the signature of the old feed - proving the migration)
I shall proceed with implementing the race-based feed fetching.