stackoverflow_lr trainer problem
I am using FedML to train stackoverflow_lr, and I adopt the hyperparameters recommended in the original paper 'Adaptive Federated Optimization' (learning rate = 100, optimizer = sgd) and can not get expected results. I wonder if the implementation of the trainer in FedML is different from TFF. I noticed that you used clip_grad_norm_ to avoid nan loss, otherwise the loss can not even drop. Is this operation optional or also used in TFF? I would be appreciate if you can give me some advice on the training process.
@ZSL98 have you finally addressed the problem? It seems our results are here are reasonable: https://doc.fedml.ai/simulation/benchmark/BENCHMARK_simulation.html
The issue has already been addressed.