(outdated) feat(iroh-sync): download policies and fetching missing blobs
This PR is outdated and superseded by #1870 #2085 what still needs to be extracted is the part about fetching missing blobs.
Description
- Add download policies per document to specify if/which blobs should be downloaded
- Add feature to retrieve hashes of missing blobs per document
- Refactor of downloader to support a notion of
Grouplabels for hashes and peers, which will try to download all hashes in group if a new peer labeled with the same group is added - Add
ResourceHintsto indicate which peers and/or groups might have a resource, andNodeHintsto indiciate to which groups a node belongs and which resources it can or cannot provide - Refactor downloader to not queue individual resources, but instead operate on the list of nodes that are added, and keep them busy with resources that they might have
The refactor of the downloader turned out quite a bit bigger than I initially thought. I first tried to work the groups into the existing architecture (with ProviderMap and a queue of scheduled requests), but failed to do so. I then wrote a new internal downloader State which is IO-less and keeps the maps of resources, groups and nodes. It also completely removes the queue of scheduled requests, and the notion of Provider and Candidate peers. Instead, the mechanism is now as follows:
- Resources are added to the downloader with hints as to which
nodesandgroupsprovide the resource - Nodes are added to the downloader with hints to which group they belong and with info (if we have it) on resources they have or have not
- The downloader then tries to keep all nodes busy, if the concurrency limits permit so, by queuing resources according to the hints
Missing:
- Restore the
check_invariants - Write tests that actually test the download policies
Notes & open questions
Currently resources remain in the state forever until completed. They are not removed on failures or cancelled intents, because if we remove them, the hints we provided via node_add would be lost. If no intents are left/registered, the downloads won't be started, but the resource info with node hints will remain in-memory. Not sure yet what the best decision strategy is if/when to prune this resource entries.
Change checklist
- [ ] Self-review.
- [ ] Documentation updates if relevant.
- [ ] Tests if relevant.