w3up icon indicating copy to clipboard operation
w3up copied to clipboard

Wire index/add handler to write derived DUDEWHERE index

Open Gozala opened this issue 9 months ago • 3 comments

Context

Freeway utilizes set of R2 buckets to provide read interface

  • CARPARK - CAR file storage area. Key format <CAR_CID>/<CAR_CID>.car
  • SATNAV - Indexes of block offsets within CARs. Key format <CAR_CID>/<CAR_CID>.car.idx, index format MultihashIndexSorted.
  • DUDEWHERE - Mapping of root data CIDs to CAR shard CID(s). Key format <DATA_CID>/<CAR_CID>.

How it works:

  1. Extract DATA_CID from URL.
  2. Lookup CAR_CID(s) in DUDEWHERE.
  3. Read indexes from SATNAV
  4. UnixFS export directly from CARPARK using index data to locate block positions.

What

Derive and write DUDEWHERE index records from the dag index passed into #1401

Why

Otherwise uploads content uploaded through blob/add will be readable via Freeway

Gozala avatar Apr 24 '24 19:04 Gozala

Please note: today we write these indexes on upload/add https://github.com/w3s-project/w3up/blob/main/packages/upload-api/src/upload/add.js#L40

My first inclination was, we should remove that from upload/add and add here. But then we break the old store/add flow . Probably we will need to consider upload/add to receive both CARLink as shards and multihashes and distinct what to do there? as in, one does as of today, while new writes the b58btc encoded multihash after dataCID

What do you think @Gozala ?

vasco-santos avatar Apr 25 '24 08:04 vasco-santos

I have not considered upload/add and now I wonder if index/add subsumes that functionality or if we do need both 🤔 In terms of what to do I see following options to choose from:

  1. We issue upload/add without any shards from new clients as they will be using blob/add & index/add anyway.
  2. We issue upload/add but with RAW cid shards and than in the handler we can omit non CAR links.
  3. We don't do any upload/add, but surface things added via index/add from upload/list.

From where I stand first option seems most rational, but given a good argument I can see a second as good candidate also. Third option seems too drastic and I would prefer to do 1st or 2nd now and consider doing 3rd in the future followup.

Gozala avatar May 06 '24 21:05 Gozala

Thinking bit more about it I think in the future upload list should simply be a list of CBOR objects like { root: Link, parts: Link<BlobAddReceipt>[] } where root is a DAG root and parts are receipts for it's parts. Perhaps parts should be links to invocations instead of receipts so it could represent in-progress uploads also.

Gozala avatar May 06 '24 21:05 Gozala

Note: we decided to not do this, and freeway uses materialized location claims instead.

alanshaw avatar May 20 '24 15:05 alanshaw