Oort
Oort copied to clipboard
Error in the execution using oort as the sampler
Hello,
There is an issue with the execution of with the oort sampler which causes the program to stop.
Experiment: running google speech benchmark with the same configurations as set by default in the conf.yml in the repo.
Error: probabilities do not sum to 1 thrown by resampleClients function of the parameter server:
sampledClientsRealTemp = sorted(clientSampler.resampleClients(numToSample, cur_time=epoch_count)) https://github.com/SymbioticLab/Oort/blob/78fc6d08a1c6f428a8ad1b41b826865a35ba01e1/training/param_server.py#L377
Log output when running with 100 clients (stops every time at Epoch 24):
data:image/s3,"s3://crabby-images/f0d7b/f0d7b58d4941cca770d0a7e8c01298057fd877ce" alt="Screen Shot 2021-06-16 at 10 06 38 AM"
Log output when running with 10 clients (stops at Epoch 329):
data:image/s3,"s3://crabby-images/4deb1/4deb1f4d2627217d8e95e54883f0866cf033a903" alt="Screen Shot 2021-06-16 at 11 40 08 AM"
Hi, ahmedcs,
Sorry for the late reply. This is very likely due to the insufficient number of clients in this dataset, so please try a larger dataset. We have fixed this issue in our FedScale repo.
We plan to update Oort soon (hopefully in the next few weeks) and make Oort be the execution backend of FedScale. In the meantime, feel free to use Fedscale, which supports Oort selector too. Please let us know if you have any questions!
The error is caused by the function: random.choice(). Since the assigned weights for sampling do not sum to 1 (actually, it's all zeros). An intuitive method is to avoid the sampling by checking the sum of possibilities assigned to the function and stop the exploration processing when it's all zeros.