(outdated) feat(iroh-sync): download policies and fetching missing blobs

Open Frando opened this issue 2 years ago • 0 comments

This PR is outdated and superseded by #1870 #2085 what still needs to be extracted is the part about fetching missing blobs.

Description

Add download policies per document to specify if/which blobs should be downloaded
Add feature to retrieve hashes of missing blobs per document
Refactor of downloader to support a notion of Group labels for hashes and peers, which will try to download all hashes in group if a new peer labeled with the same group is added
Add ResourceHints to indicate which peers and/or groups might have a resource, and NodeHints to indiciate to which groups a node belongs and which resources it can or cannot provide
Refactor downloader to not queue individual resources, but instead operate on the list of nodes that are added, and keep them busy with resources that they might have

The refactor of the downloader turned out quite a bit bigger than I initially thought. I first tried to work the groups into the existing architecture (with ProviderMap and a queue of scheduled requests), but failed to do so. I then wrote a new internal downloader State which is IO-less and keeps the maps of resources, groups and nodes. It also completely removes the queue of scheduled requests, and the notion of Provider and Candidate peers. Instead, the mechanism is now as follows:

Resources are added to the downloader with hints as to which nodes and groups provide the resource
Nodes are added to the downloader with hints to which group they belong and with info (if we have it) on resources they have or have not
The downloader then tries to keep all nodes busy, if the concurrency limits permit so, by queuing resources according to the hints

Missing:

Restore the check_invariants
Write tests that actually test the download policies

Notes & open questions

Currently resources remain in the state forever until completed. They are not removed on failures or cancelled intents, because if we remove them, the hints we provided via node_add would be lost. If no intents are left/registered, the downloads won't be started, but the resource info with node hints will remain in-memory. Not sure yet what the best decision strategy is if/when to prune this resource entries.

Change checklist

[ ] Self-review.
[ ] Documentation updates if relevant.
[ ] Tests if relevant.

Nov 10 '23 11:11 Frando