boxo icon indicating copy to clipboard operation
boxo copied to clipboard

[ipfs/go-bitswap] Integration between Graphsync and IPFS

Open petar opened this issue 4 years ago • 6 comments

The goal is to enable bitswap to support different methods of fetching a block, so that it can access non-bitswap sources like filecoin nodes which may use graphsync (via https://github.com/filecoin-project/go-data-transfer) and eventually other payment-based methods.

Fundamentally, Bitswap brokers information about which peers have a cid. This is captured in the form (cid, peer_id). It is implied that the method of fetching is the bitswap transfer protocol.

To generalize Bitswap, we need to change the information that is associated with a cid. For each cid, we would like to keep track of multiple "routing expressions" each of which describes a different method to fetch the block.

Routing expressions are expressions in the routing language syntax, which represent valid descriptions of methods to fetch a block, according to the existing Routing Language Spec.

For instance,

     fetch(
          cid=link("Qm15"),
          proto=bitswap,
          providers=[multiaddr("/ip4/8.1.1.9:44")],
     )

or

     fetch(
          cid=link("Qm15"),
          proto=graphsync,
          graphsync_voucher=0x12ef78cd,
          providers=[multiaddr("/ip4/8.1.1.9:44")],
     )

In essence, the routing information brokered should be of the form (cid, list of routing expressions).

This entails changes to every part of bitswap that touches routing information (for cids):

  • The first (of two) entry points of new routing information is access to the DHT, which is abstracted behind the interface ProviderFinder. This interface has to be generalized accodingly, essentially to match the generic composable routing interface. This interface should also be moved to go-composable-routing repo (it does not belong in bitswap).

    • ProviderFinder is implemented by ProviderQueryManager, which acts as middleware between bitswap and making routing calls to the DHT, which adds throttling, dedup, batching. ProviderQueryManager must be:
      • generalized to use the composable routing interface (to make it middleware officially)
      • ideally broken down into independent middleware blocks (batching, throttling, dedup) which are chained
      • moved to go-composable-routing repo
  • The second (of two) entrypoints of new routing information is reception of "have" messages from the bitswap gossip protocol. On reception, the "have" information must be converted into a routing expression, so that it can be treated in the same manner as other routing information downstream.

  • The logic that reacts to new routing informatoin must also be updated. At the moment the only routing information that enters bitswap is "have" information, and it is acted on immediately by firing/queuing respective "want" requests. Going forward, routing information that corresponds to "have" messages can be treated as before. However, we need to decide how to schedule fetching from non-bitswap sources (like filecoin/graphsync) and generally how to prioritize/parallelize fetching from different sources (bitswap and non-bitswap).

Remarks This is an absolute minimum plan to enable the integration. Going forward, a lot of additions can be made to improve the scale and speed of the routing process in bitswap. E.g. the "have" messages can be generalized to communicate multiple sources for a block, so that peers can share with each other knowledge about where else the block can be downloaded. E.g. "I have the block, but I also know that this filecoin miner has the block you want too, and they also have the entire directory where the block lives."

Related IPFS / Filecoin interop plan: https://hackmd.io/JoZiAAtnTpqAKuQaEUra4g

PRs comprising the resolution of this issue Step 1: https://github.com/ipfs/go-bitswap/pull/512

Follow-up tasks After this issue is resolved, the following (smaller) issues must be addressed before IPFS is fully ready to talk to the Golden Path product: https://github.com/ipfs/go-bitswap/issues/509, https://github.com/ipfs/go-bitswap/issues/510

petar avatar Jun 24 '21 16:06 petar

Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review. In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment. Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:

  • "Priority" labels will show how urgent this is for the team.
  • "Status" labels will show if this is ready to be worked on, blocked, or in progress.
  • "Need" labels will indicate if additional input or analysis is required.

Finally, remember to use https://discuss.ipfs.io if you just need general support.

welcome[bot] avatar Jun 24 '21 16:06 welcome[bot]

This statement doesn't make a lot of sense. I assume you're referring to some form of meta exchange that can use both the bitswap protocol and graphsync?

To generalize Bitswap, we need to change the information that is associated with a cid. For each cid, we would like to keep track of multiple "routing expressions" each of which describes a different method to fetch the block.

This could use a lot of motivation. I'd expect the flow to be:

  1. I find out who has what. This is a mapping of CID -> PID.
  2. I connect to peers, then request content via whatever protocol they support.

Of course, I might want additional information before I bother to make a connection. For example:

  • Supported protocols.
  • Pricing.
  • Maybe vouchers? Really, more like arbitrary "tokens". Effectively "curried" arguments.

But then I'd expect the record to look more like:

{
    provider: PeerID,
    content: cid,
    protocols: {
        "/ipfs/bitswap/1.1.0": {...},
        "/ipfs/graphsync/1.0.0": {"token": ..., "price": ....}, // needs to specify across payment systems.
    }
}

Eventually, "queries" could be extended to select things like "supports graphsync but charges less than X".

Stebalien avatar Jul 04 '21 22:07 Stebalien

@Stebalien:

I find out who has what. This is a mapping of CID -> PID.

This is how things work today. We'd like to generalize this significantly. A source for a CID's content need not be a peer at all. For instance, it could be a legacy FTP service at a given IP, or a Bittorrent link (which doesn't even refer to a specific host). A routing expression can describe any such method.

{
    provider: PeerID,
    content: cid,
    protocols: {
        "/ipfs/bitswap/1.1.0": {...},
        "/ipfs/graphsync/1.0.0": {"token": ..., "price": ....}, // needs to specify across payment systems.
    }
}

Since discoverable sources for a CID may be heterogenous (e.g. a peer using bitswap, filecoin miner using graphsync, github repo at a given commit, etc), each CID is associated with a list of routing expressions, each of which describes some individual source. This is in contrast to having a single CID record (as the one above) that tries to describe all sources.

petar avatar Jul 08 '21 15:07 petar

Ah, I see. Yeah, that makes a lot of sense. So we'd have an engine on-top-of-bitswap handling the generalized content routing records, passing information into each protocol.

Since discoverable sources for a CID may be heterogenous (e.g. a peer using bitswap, filecoin miner using graphsync, github repo at a given commit, etc), each CID is associated with a list of routing expressions, each of which describes some individual source. This is in contrast to having a single CID record (as the one above) that tries to describe all sources.

Makes sense.

Stebalien avatar Jul 08 '21 23:07 Stebalien

My recommendation is to do https://github.com/ipfs/go-bitswap/pull/512 to abstract the content routing source, add the ability to talk to indexers once they exist, and stop till we understand the direction we're heading.

As I see it, there are two paths to Golden Path in IPFS:

  • Just lean into bitswap, minimal changes -- get miners to turn on bitswap in their markets process, backed by a blockstore that reads from unsealed pieces using the miner index, serve only free data, do the minimum amount of work to get Bitswap to talk to indexers as well as the DHT, call it a day. Once global indexes and miner indexes exist, that's a pretty short project -- maybe 2-3 months. Gets you to Golden Path free retrievals in go-ipfs, with any miner that will actually leave bitswap on (I'm not sure how many there are of these). I always call this the @alanshaw solution cause he originally proposed it.

  • Actually get go-ipfs to switch protocols between Bitswap and graphsync, speak data transfer, do free and paid retrievals, etc. And I have some strong opinions on it:

    • the go-bitswap library should get SMALLER, not larger -- it's already a beast. IMHO, go-bitswap should become a Bitswap protocol implementation, not a generalized content fetching implementation. The longer we keep saying "let's just throw more stuff in go-bitswap" that has nothing to do with the bitswap protocol -- even speaking other protocols like Graphsync -- honestly the more confusion we create about what a "bitswap" is. (keep in mind, JS bitswap doesn't even have sessions) I think: Sessions, content-routing, etc needs to move up into some kind of meta library.
    • Actually implementing this protocol mixing and routing mixing needs a bunch of people's input. I've worked on all these libraries for years and authored the first implementations of some of them. I still don't think I know the absolute best way to do it. There are so many questions:
      • What's the unit of transfer above a block -- is it a data transfer? Is it a DAG? Is it analogous to a Session (which can be lots of DAG/Block requests, related simply by programmer choice)?
      • Also, what's the hierarchy of moving pieces in terms of libraries? I have thought at times go-data-transfer is the "meta" library above bitswap and graphsync that we're talking about, but currently it has no routing. Maybe it is something else.
      • I really LIKE in this proposal is the idea of a universal routing stream. It's actually pretty great IMHO. One thing I've thought about a lot is that we should add the equivalent of WANT-HAVE to graphsync -- that would say -- do you have this DAG? The response could be Yes/No but also the yes could also include the CIDS without the blocks. I outlined this proposal here: https://github.com/protocol/beyond-bitswap/issues/25 & here https://github.com/ipld/specs/pull/355
      • I really like @raulk 's idea of a universal per-request event bus that all layers communicate on. It think there's a kernel of an idea for how all the things can come together there.
      • long and short it's a big hard problem and we ought to have a team of us thinking and researching an approach together if we go this path.

So my take is: do the part that is needed for either approach and stop. There's no progress to be made for real until miner indexes actually exist anyway. Especially if we do the great Web3 future data transfer stack refactor, we need a wide set of folks working on it. If we want to do something further, I would allocate a team of folks with deep experience in our data transfer protocols and content routing to do planning for how to actually refactor our libraries top to bottom to deliver on the needs for mixing filecoin and IPFS. This would at least help us determine how much work we're actually talking about, and when we could realistically deliver it.

hannahhoward avatar Jul 09 '21 01:07 hannahhoward