[RFC] Parent selection based on node state awareness
Introduction
Feature request:
Dragonfly is an efficient, stable, and secure file distribution and image acceleration tool based on P2P. However, currently the Parent selection method for downloading Dragonfly file pieces is based on the FCFS method (i.e., a certain Piece Metadata is obtained from which Parent first, and the corresponding Piece is downloaded from that Parent). This node selection method cannot dynamically perceive changes in Parent node state (network bandwidth, disk IO) and cannot fully utilize bandwidth resources.
Therefore, I propose a download node selection method based on Parent state awareness, which will be introduced in detail below.
Use case:
UI Example:
Design
Architecture
The following is the overall architecture diagram of the design, mainly including ParentStateSyncer, ParentStateServer and PieceCollector are three parts.
Modules
- ParentStateServer: The backend daemon thread on the upload server side, which periodically counts the local network bandwidth and disk bandwidth then calculates the local node state, and sends the latest state to each connection in the SyncHost connection set maintained by LRU;
- ParentStateSyncer: The backend daemon thread on the client side, which uses LRU cache to maintain the set of Parents that need to synchronize their states, and sends SyncHost requests to synchronize all parent statuses in the cache;
- PieceCollector: Retrieve the states of the parents being followed from ParentStateSyncer, select the optimal download parent and its corresponding piece;
Download Process
- Start downloading, PieceCollector registers the scheduled parents into the LRU cache of ParentStateSyncer;
- ParentStateSyncer synchronizes the parents' states from ParentStateServer in the background;
- **PieceCollector **periodically updates the state of the parents it focuses on from ParentStateSyncer;
- PieceCollector obtains the scheduled parents and their corresponding pieces based on the node selection method;
Node Selection Method
- The downloaded piece-metadata is saved in different queues according to the parent;
- Based on the parent status, use a random number to select a parent;
- [normal case] If there is piece-metadata in the parent queue, select the first element directly;
- [queue empty] Select the next parent queue in order;
- [piece finished] Skip until the queued piece has not been downloaded or the queue is empty;
Configuration
upload:
# configuration for HostSyncer
syncer:
# enable indicates whether enable HostSyncer.
enable: true
# intervalis the interval to sync hosts' info.
interval: 3s
# cache_capacity is the capacity of the cache by LRU algorithm for HostSyncer grpc connection, default is 50.
cacheCapacity: 50
API Definition
message SyncHostRequest {
// Host id.
string host_id = 1;
// Peer id.
string peer_id = 2;
}
// DfdaemonUpload represents upload service of dfdaemon.
service DfdaemonUpload{
// SyncHost sync parents state.
rpc SyncHost(SyncHostRequest) returns (stream common.v2.Host);
}
actions:
- api define, week1
- configuration, week1
- upload server, week1
- parent selector, week2
- piece collector, week2
- test, unit test & e2e test & stress test, week3
Impressive design! Regarding the part about perceive changes in Parent node state, collecting node metrics is a complex task. Perhaps we could consider using the OpenTelemetry metrics to interface with the data collection daemons which usually already exist in most production environments, rather than implementing this part ourselves. This way, we only need to design a mechanism that adjusts the scheduling weight of the current node based on metrics.
Impressive design! Regarding the part about perceive changes in Parent node state, collecting node metrics is a complex task. Perhaps we could consider using the OpenTelemetry metrics to interface with the data collection daemons which usually already exist in most production environments, rather than implementing this part ourselves. This way, we only need to design a mechanism that adjusts the scheduling weight of the current node based on metrics.
Your suggestion is very good. But we believe that. Firstly, the node state data (such as real-time bandwidth) that our method relies on requires strong real-time performance, which may not be achievable if collected using OpenTelemetry. Secondly, our approach is the basic functionality of dragonfly, and using OpenTelemetry may lead to excessive dependency issues in the project.
Test loaclly
I set up a Dragonfly cluster locally using Docker to test the ParentSelector feature.
I activate a seed peer and a peer as parents (limit bandwidth to 100mbps using tc). And start iperf3 in the seed peer container to simulate the situation where bandwidth is occupied. Afterwards, I launched a local peer and used dfget to conduct file download tests through it.
The difference is that:
Enable ParentSelector: download.parentSelector.enable=true;
Disable ParentSelector: download.parentSelector.enable=false;
Settings
Peers:
A Seed Peer (running iperf3, as parent) A Peer (as parent) Local Peer (running dfget to test)
Target File:
Name: random_file (generated by dd if=/dev/urandom)
Size: 1GB
Result
Enable ParentSelector: 76s
Disable ParentSelector: 112s
Video links
Enable ParentSelector: https://pan.baidu.com/s/1NExIVdwI2O8lbmyPsoy4aQ?pwd=mw8p Disable ParentSelector: https://pan.baidu.com/s/14FzaBLcK1CwSrUA5n32egg?pwd=y6ej
Test