Oort icon indicating copy to clipboard operation
Oort copied to clipboard

Error in the execution using oort as the sampler

Open ahmedcs opened this issue 3 years ago • 2 comments

Hello,

There is an issue with the execution of with the oort sampler which causes the program to stop.

Experiment: running google speech benchmark with the same configurations as set by default in the conf.yml in the repo.

Error: probabilities do not sum to 1 thrown by resampleClients function of the parameter server:

sampledClientsRealTemp = sorted(clientSampler.resampleClients(numToSample, cur_time=epoch_count)) https://github.com/SymbioticLab/Oort/blob/78fc6d08a1c6f428a8ad1b41b826865a35ba01e1/training/param_server.py#L377

Log output when running with 100 clients (stops every time at Epoch 24):

Screen Shot 2021-06-16 at 10 06 38 AM

Log output when running with 10 clients (stops at Epoch 329):

Screen Shot 2021-06-16 at 11 40 08 AM

ahmedcs avatar Jun 16 '21 09:06 ahmedcs

Hi, ahmedcs,

Sorry for the late reply. This is very likely due to the insufficient number of clients in this dataset, so please try a larger dataset. We have fixed this issue in our FedScale repo.

We plan to update Oort soon (hopefully in the next few weeks) and make Oort be the execution backend of FedScale. In the meantime, feel free to use Fedscale, which supports Oort selector too. Please let us know if you have any questions!

fanlai0990 avatar Jun 28 '21 04:06 fanlai0990

The error is caused by the function: random.choice(). Since the assigned weights for sampling do not sum to 1 (actually, it's all zeros). An intuitive method is to avoid the sampling by checking the sum of possibilities assigned to the function and stop the exploration processing when it's all zeros.

liecn avatar Aug 06 '21 00:08 liecn