go-bitswap icon indicating copy to clipboard operation
go-bitswap copied to clipboard

Proposal: Streaming GetBlocks

Open Stebalien opened this issue 6 years ago • 11 comments

At the moment, it's a bit tricky to continuously prefetch blocks as one needs to launch a goroutine per batch.

Proposal: Change the signature of GetBlocks to take an input channel and return an output channel:

// Fetcher is an object that can be used to retrieve blocks (defined in go-ipfs-exchange-interface)
type Fetcher interface {
	// GetBlock returns the block associated with a given key.
	GetBlock(context.Context, cid.Cid) (blocks.Block, error)

	// GetBlocks returns a stream of blocks, given a stream of CIDs. It will
	// return blocks in any order.
	//
	// To wait for all remaining blocks, close the CID channel and wait for
	// the blocks channel to be closed. A closed channel does not mean that
	// _all_ blocks were retrieved, it just means that the fetcher is done
	// retrieving blocks.
	GetBlocks(context.Context, <-chan cid.Cid) (<-chan blocks.Block, error)
}

Additional elements:

  • The go-blockservice.BlockGetter interface should be replaced with the Fetcher interface.
  • ~The output channel should have the same buffer as the input channel (I think?).~
  • The error return type is for indicating that the request couldn't even be started (e.g., closed service). This interface doesn't really have a way to report runtime errors (and there's almost always nothing we can do about them anyways).

This will require changes to go-ipfs-exchange-interface, go-blockservice, and go-bitswap.

Stebalien avatar Dec 09 '19 21:12 Stebalien

The output channel should have the same buffer as the input channel (I think?).

I'm really not sure the best way to do this. We could also just have no backpressure at this layer and say that the caller shouldn't ask for more blocks than it can handle.

Stebalien avatar Dec 09 '19 21:12 Stebalien

This proposal LGTM - I'm not sure I understand what "the same buffer" means in this context - do you mean the size of the output channel buffer is the same as the size of the input channel buffer?

dirkmc avatar Dec 09 '19 21:12 dirkmc

do you mean the size of the output channel buffer is the same as the size of the input channel buffer

Yes. The question is really: how do we apply backpressure (and do we)?

Stebalien avatar Dec 10 '19 08:12 Stebalien

Currently in Bitswap

If Session.GetBlocks() takes a channel of CIDs it seems like it would make sense for it to do something similar, ie

  • pop wants off the incoming channel immediately and put them in a queue
  • buffer received blocks until the client is ready to process them

dirkmc avatar Dec 10 '19 16:12 dirkmc

SGTM.

Stebalien avatar Dec 10 '19 18:12 Stebalien

@aschmahmann and I talked it over and he suggested that there may be some scenarios in which back-pressure would be useful. For example if an app uses Bitswap to download a movie, it would be useful to provide a buffer size to the session such that Bitswap would stop popping wants off the input channel once the downloaded blocks exceed the buffer size. This would allow Bitswap to prioritize bandwidth for Sessions that need the data immediately.

On the other hand because Bitswap works best when it can request a lot of blocks in parallel, and because it doesn't provide a guarantee of the order of blocks the buffer size may not be that useful in practice.

dirkmc avatar Dec 10 '19 20:12 dirkmc

My concern is that bitswap will keep popping blocks off the queue until it starts downloading blocks. That means we need an infinite receive buffer anyways. Basically, the tradeoff is between backpressure and potential deadlocks and the backpressure may not be effective enough to make it worth it.

We should consider some kind of outstanding wants limit (total and per session). But that's a more general issue that we can probably tackle separately.

Stebalien avatar Dec 11 '19 08:12 Stebalien

I agree I think the glove doesn't quite fit so it's better to punt on back-pressure

dirkmc avatar Dec 11 '19 13:12 dirkmc

How then NodeGetter will work?

I am a bit ahead of the issue due to the development of the small protocol that is using the same pattern with input CID chan. A problem is that I still have not decided on the best way to make the pattern work with DAGService without changing its interface and I am curious about your plan here.

By the way, my protocol has a nice dynamic buffer for back-pressure with ordering which may match your needs.

Wondertan avatar Jun 01 '20 17:06 Wondertan

Yeah, this would require changes in the DAGService. We'd likely add an optional StreamBlocks() method.

Is there a description somewhere of go-blockstream's design? I'd love to take a look.

Stebalien avatar Jun 01 '20 17:06 Stebalien

Ok, thanks. I would like to help pushing this forward. Also, an intention for the PR here is to implement a new NavigableNode over the streaming interface. In case DAGService will have the new method for that we can simply reimplement it or still go with a new one.

Currently, I don't have any detailed description of the protocol, but going to write it soon after some changes needed for handling stream reset cases. Simply saying, blockstream removes unnecessary complexity from bitswap when block providers are already known and splits requests between them with order guarantees. Additionally, the codebase has convenient IPLD walking functionality over the streaming interface. In prospect, I want to integrate IPLD selectors in it - making a graphstream?😄

Wondertan avatar Jun 02 '20 14:06 Wondertan