specs
specs copied to clipboard
IPIP-342: Ambient Discovery of Content Routers
This follows the previously circulated proposal outline at https://hackmd.io/bh4-SCWfTBG2vfClG0NUFg
A basic motivation is included in the PR - but essentially this is the best path I've heard for reducing our dependence on hydras as a centrally operated choke point for moving the bulk of the IPFS network beyond sole reliance on the current KAD DHT.
I'm missing a way to link content with the provider because I seriously doubt that all parties will be eager to provide all the CIDs in the universe, they will focus on providing the content they care about.
We could link root CID/CIDs with the provider to let know to nodes where they can find the DAG for specific content.
This is a philosophical disagreement about what a 'content router' is. I'm going to refer you to the broad guidance that was presented over the summer for the evolution of content routing.
We currently have a couple examples of content routers that do have all the CIDs in the universe, and we do not have convincing examples of, or a definition for, sub-content-routers as you're proposing here.
Why are we trying to compromise to a much-harder-to-make-work complexity without trying for the thing that makes sense and the direction we're heading first?
@willscott I think more than a philosophical disagreement is a physical one.
Right now we are able to keep all the CIDs on the network in one provider for two reasons:
- The network is relatively small
- Right now we have only two places where nodes are providing CIDs: DHT and Bitswap, so it is easier to get that information because is only in two specific places.
When we start to have different ways of providing CIDs, will be near impossible to have everything replicated by everyone.
Also when the network scales, having centralized all the information in several places will be quite challenging and costly. But on the other hand, allowing both approaches (providing everything vs providing a subset of the CIDs) will have for sure their use cases for people with not a huge amount of money to maintain big providers.
We currently have providers stepping up to provide full replicas of a content routing database. That is what network indexers have been doing over the last year. I don't see the physical issue here: indexers are handling trillions of records, vs the 100 billion in the DHT, and the scale we expect as we grow will still fit at the level of a single rack in a data center.
In designing delegated routing so far, the eye has been towards a design where delegated routers need to fall back and do the additional work of querying other routers in order to collect a full replica if they don't possess it themselves, rather than making that the end kubo node's responsibility, as that leads to an untenable performance and decision process for end user nodes that are not equipped to handle that.
I'm not entirely sure of your counter proposal here: I think there are very strong counter arguments against both https://github.com/ipfs/specs/pull/322 - which compromises trying to be a content addressed network, or limited DHT providing (e.g. to roots) - which still couldn't handle the current indexer database scale.
I updated to hopefully address your review, @lidel
- There's a more concrete description of what the proposed protocol will look like
- I added a query tag to support discovery of more than just content routing through the mechanism
- in terms of 'is this only for ipni' / are you 'punished' if you're incomplete - i think it's probably too early to try to predict the dynamics here without either pretty extensive modeling or subsequent user data. The ranking / propagation of routers would be impacting by what users are after - so if there's a community or application that's focused on a narrow subset that's well addressed by a different router, that router could also be very viable with this discovery mechanism.
This came up in https://pl-strflt.notion.site/2023-05-30-Content-Routing-WG-12-b2ed74834fe44e359bbcdd02740e2084
There is going to be implementation need for this in the next quarter or two. As a result, want to get ahead before code gets written and decisions ossify.
Some next steps:
- Update the IPIP to the newest template standard with lessons learned the last 6 months
- I assume @willscott or someone from PL EngRes Bedrock team will do this.
- Review from spec maintainers
- I assume @lidel will take the first pass here on the updated doc when it's ready.
Implementation notes: "Lassie currently does not depend on boxo. It may be hard for us to prototype ambient-discovery in boxo until the boxo/rest-of-world dependency conflicts are resolved"