LG-FedAvg
LG-FedAvg copied to clipboard
Failed to converge when changing num_users and frac
Description
When I change the num_user to 10 and frac to 0.3 with --iid, which means each epoch there are 3 client been choosen, I find the model become better then worse.
Reproduce
$ python main_fed.py --dataset mnist --model mlp --num_classes 10 --epochs 1000 --lr 0.05 --num_users 10 --shard_per_user 2 --frac 0.3 --local_ep 1 --local_bs 8 --results_save run1 --iid
Out
device: cuda:0
MLP(
(layer_input): Linear(in_features=784, out_features=512, bias=True)
(relu): ReLU()
(dropout): Dropout(p=0.5, inplace=False)
(layer_hidden1): Linear(in_features=512, out_features=256, bias=True)
(layer_hidden2): Linear(in_features=256, out_features=256, bias=True)
(layer_hidden3): Linear(in_features=256, out_features=128, bias=True)
(layer_out): Linear(in_features=128, out_features=10, bias=True)
(softmax): Softmax(dim=1)
)
Round 0, lr: 0.050000, [5 6 0]
Round 0, Average loss 2.038, Test loss 1.794, Test accuracy: 67.63
Round 1, lr: 0.050000, [6 4 5]
Round 1, Average loss 1.748, Test loss 1.611, Test accuracy: 85.05
Round 2, lr: 0.050000, [7 9 4]
Round 2, Average loss 1.761, Test loss 1.717, Test accuracy: 74.39
Round 3, lr: 0.050000, [7 4 9]
Round 3, Average loss 1.856, Test loss 1.843, Test accuracy: 61.74
Round 4, lr: 0.050000, [9 2 5]
Round 4, Average loss 1.948, Test loss 1.863, Test accuracy: 59.83
Round 5, lr: 0.050000, [2 6 7]
Round 5, Average loss 2.039, Test loss 1.990, Test accuracy: 47.11
Round 6, lr: 0.050000, [0 7 2]
Round 6, Average loss 2.025, Test loss 1.997, Test accuracy: 46.39
Round 7, lr: 0.050000, [4 3 2]
Round 7, Average loss 2.017, Test loss 2.104, Test accuracy: 35.68
Round 8, lr: 0.050000, [2 9 1]
Round 8, Average loss 2.128, Test loss 2.113, Test accuracy: 34.82
Round 9, lr: 0.050000, [2 7 5]
Round 9, Average loss 2.127, Test loss 2.190, Test accuracy: 27.09
Round 10, lr: 0.050000, [1 9 7]
Round 10, Average loss 2.194, Test loss 2.239, Test accuracy: 22.21
Round 11, lr: 0.050000, [0 2 3]
Round 11, Average loss 2.236, Test loss 2.186, Test accuracy: 27.53
Round 12, lr: 0.050000, [3 9 5]
Round 12, Average loss 2.188, Test loss 2.108, Test accuracy: 35.29
Round 13, lr: 0.050000, [3 6 5]
Round 13, Average loss 2.172, Test loss 2.237, Test accuracy: 22.45
Round 14, lr: 0.050000, [9 8 4]
Round 14, Average loss 2.258, Test loss 2.175, Test accuracy: 28.61
Round 15, lr: 0.050000, [2 7 1]
Round 15, Average loss 2.178, Test loss 2.161, Test accuracy: 29.99
Round 16, lr: 0.050000, [9 6 4]
Round 16, Average loss 2.192, Test loss 2.280, Test accuracy: 18.10
Round 17, lr: 0.050000, [2 4 0]
Round 17, Average loss 2.284, Test loss 2.125, Test accuracy: 33.60
Round 18, lr: 0.050000, [4 1 0]
Round 18, Average loss 2.226, Test loss 2.352, Test accuracy: 10.94
Round 19, lr: 0.050000, [6 0 7]
Round 19, Average loss 2.355, Test loss 2.352, Test accuracy: 10.94
Round 20, lr: 0.050000, [1 8 6]
Round 20, Average loss 2.351, Test loss 2.339, Test accuracy: 12.24
Round 21, lr: 0.050000, [1 2 3]
Round 21, Average loss 2.338, Test loss 2.339, Test accuracy: 12.24
Round 22, lr: 0.050000, [9 3 1]
Round 22, Average loss 2.340, Test loss 2.339, Test accuracy: 12.24
Round 23, lr: 0.050000, [4 2 0]
Round 23, Average loss 2.337, Test loss 2.339, Test accuracy: 12.24
Round 24, lr: 0.050000, [8 1 5]
You can solve this problem simply by set lr_decay =0.95 and replae
w_local, loss = local.train(net=net_local.to(args.device))
with
w_local, loss = local.train(net=net_local.to(args.device), lr=lr)
or choose other powerful optim rather than SGD