w3link icon indicating copy to clipboard operation
w3link copied to clipboard

Concatenated upload index of all CAR indexes for a root CID

Open olizilla opened this issue 2 years ago • 5 comments
trafficstars

We hit issues where users send us a dag split over > 1000 CARs as we have to load a CAR index for each CAR before we can figure out where to fetch blocks from. If we create an upload index file for the root CID, as the concatenation of each CAR index, we only have to fetch a single file before we can start responding. I think this would solve the issue we're seeing #46

There remain edge cases where a single file is split over > 1000 CARs, but either the user is sending us CAR shards that are too small, or the file is massive. For example if users stick with the 100MiB CAR shard size we provide, they'd upload a 32GiB DAG in 328 CARs, so we could tackle that as a lower priority issue.

olizilla avatar Mar 15 '23 14:03 olizilla

As a low risk first pass, we could create the concatenated upload index after each call to upload/add in the upload api. It's fine if the user calls it multiple times and adds CARs incrementally. We can rebuild it from scratch each time, or read any existing one and make it smarter if we need to.

olizilla avatar Mar 15 '23 14:03 olizilla

could be one for https://github.com/mikeal/multiblock

olizilla avatar Mar 15 '23 16:03 olizilla

Flag that we'd not be able to simply concat them together because we lose the information about which CAR file the blocks are in (as we'd effectively be creating a single index for multiple CARs). So we'd just need to include CAR CID the block can be found in in the rollup.

alanshaw avatar Apr 21 '23 11:04 alanshaw

+1 if “concatenate” means “write each index as a block in a CAR along with an object that maps the CAR CIDs to each Index CID” ;)

mikeal avatar Apr 21 '23 16:04 mikeal

There's a new index in town: https://github.com/alanshaw/cardex#multi-index-index

TLDR; it's a CARv2 index which is a list of car-cid,carv2-index pairs.

I'm going to try out rollups using this index.

alanshaw avatar Jun 05 '23 10:06 alanshaw