forest
forest copied to clipboard
`archie` and `fuzzy` roadmaps
We've got two servers:
archie. We want this to be an archival node for all of filecoinfuzzy. We want to use this for fuzz testing, maybe benchmarks. It's currently used in CI
You can read more about them here
Archie the archival node
Archie stores the entire filecoin graph as diffs[^1] from epoch 0 to 3090000 (3087000+3000)
archie@archie:~$ du -h /mnt/md0 # software raid0
19T /mnt/md0
archie@archie:~$ ls /mnt/md0 | sort --numeric --key=40 # "forest_diff_mainnet_YYYY-MM-DD_height_" is 39 characters
forest_diff_mainnet_2020-08-24_height_0+3000.forest.car.zst
forest_diff_mainnet_2020-08-25_height_3000+3000.forest.car.zst
...
forest_diff_mainnet_2023-07-31_height_3084000+3000.forest.car.zst
forest_diff_mainnet_2023-08-01_height_3087000+3000.forest.car.zst
- [ ] Archie can use its diffs in some kind of
CarBackedBlockstoreWe probably want some kind of on-disk kv store that can handle the ~300GB[^2] of CIDs and their locations. We should not have a special binary - Archie should be running vanilla Forest, however that looks after this work. See also #3361 - [ ] Archie can serve blocks from its diffs We want a test that can fetch arbitrary spans from Archie. This is not throwaway code - Fuzzy will need it later.
- [ ] Archie stays current for system packages, taking security updates.
- [ ] Archie ready for GA without DoS-ing the office Archie lives in Berlin, with SSH proxied through Cloudflare. We need to proxy general Forest traffic through Cloudflare, or move Archie out of the office.
- [ ] #3525
- [ ] Archie stays current for Filecoin We need a story for getting the latest snapshot diffs, and having them indexed. Can we do it without downtime?
- [ ] Archie is redundant Raid 0 isn't really good enough for production - if we lose a disk we're down for ~days redownloading
- [ ] Archie is production-grade An Archie instance should be able to e.g OOM without requiring manual intervention. E.g a shared kubernetes volume is read by workers, and updated by a single worker, perhaps.
Fuzzy the fuzz tester
- [x] #3514
- [ ] Fuzzy validates arbitrary blocks from Archie
We'll need some
diffandsnapshotstory here - [ ] Fuzzy can run benchmarks?
[^1]: TODO(aatifsyed): explain the graph, and diffs vs snapshots - what are we actually serving? Recall that there is exactly one epoch overlap between consecutive files [^2]: TODO(aatifsyed): show your working.
Idea: archie as a pure IPFS server
We now generate snapshots so quickly that the upload bandwidth costs more than the hardware ($86 vs $350). Perhaps archie could handle snapshots as well.