helia Multiple/alternative data retrieval implemenations

Multiple/alternative data retrieval implemenations

Open achingbrain opened this issue 1 year ago • 1 comments

Currently Helia takes a blockstore that it enhances with bitswap. This creates a hard dependency on bitswap.

To enable experimentation and adoption of faster/more use-case specific retrieval protocols (cars, graphsync, XYZNewFutureProtocol etc) we should allow this to be a configuration option.

At this point blocks may not be the correct abstraction since it limits us to a block as the unit of data you get in response to a CID.

A better read abstraction might be a CID to a stream of Uint8Arrays? The the underlying retrieval method can apply whatever optimisations it can to fetch the data quickly and the calling code doesn't have to keep going back to fetch another block for another CID.

interface Options {
  offset?: number
  length?: number
}

interface ContentReader {
  get (cid: CID, options: Options): AsyncGenerator<Uint8Array>
}

Questions:

Does this shift complexity of interpreting block data on to the content reader?
What does the writer interface look like?
- Can the writer/reader interfaces be asymmetric? E.g. CIDs/Blocks in, CID/Stream out?
Does this assume file data?
What about structures like unixfs where the root block has file metadata and then file data in leaf nodes?
If DAGs are all dag-pb, dag-cbor or dag-json we can make some assumptions about structure?

Apr 18 '23 08:04 achingbrain

helia helia copied to clipboard

Multiple/alternative data retrieval implemenations

helia
helia copied to clipboard