w3link
w3link copied to clipboard
Concatenated upload index of all CAR indexes for a root CID
We hit issues where users send us a dag split over > 1000 CARs as we have to load a CAR index for each CAR before we can figure out where to fetch blocks from. If we create an upload index file for the root CID, as the concatenation of each CAR index, we only have to fetch a single file before we can start responding. I think this would solve the issue we're seeing #46
There remain edge cases where a single file is split over > 1000 CARs, but either the user is sending us CAR shards that are too small, or the file is massive. For example if users stick with the 100MiB CAR shard size we provide, they'd upload a 32GiB DAG in 328 CARs, so we could tackle that as a lower priority issue.
As a low risk first pass, we could create the concatenated upload index after each call to upload/add in the upload api. It's fine if the user calls it multiple times and adds CARs incrementally. We can rebuild it from scratch each time, or read any existing one and make it smarter if we need to.
could be one for https://github.com/mikeal/multiblock
Flag that we'd not be able to simply concat them together because we lose the information about which CAR file the blocks are in (as we'd effectively be creating a single index for multiple CARs). So we'd just need to include CAR CID the block can be found in in the rollup.
+1 if “concatenate” means “write each index as a block in a CAR along with an object that maps the CAR CIDs to each Index CID” ;)
There's a new index in town: https://github.com/alanshaw/cardex#multi-index-index
TLDR; it's a CARv2 index which is a list of car-cid,carv2-index pairs.
I'm going to try out rollups using this index.