FedML Bugs when using seq2seq in fednlp.

Hello, I'm using fednlp to do some experiments, but I find there are no improvment between two epoch in rouge score. I once suspected that this is caused by my fault. But I tried to run the demo and found the same error. Here is the example: And next time when it run test: Is this caused by some config or others? Or can you give me some tips?

Aug 24 '22 07:08 Luoyang144

@Luoyang144 normally, you need to check the entire training/test accuracy/loss curve. Sometimes, it's normal that the optimization between two iterations are the same (epochs/round, etc.)

Aug 24 '22 08:08 chaoyanghe

@chaoyanghe But the final result are same as beyong, I suspect the model didn't get any improvement? So in fact, I guess there are some bug in learning process?

Aug 24 '22 11:08 Luoyang144

@zuluzazu Hi Mrigank, please check this bug.

Aug 24 '22 16:08 chaoyanghe

@Luoyang144 This maybe because you are using only 1 client at each round. 1 client at each round will not give enough useful information for the server to aggregate and learning will get stuck. In my demo I gave 1 client because I did not have enough GPU memory for 5 clients. You need to try atleast 6-8 clients at each round. Even after that if you feel there is a bug feel free to reach out here

Aug 24 '22 16:08 MrigankRaman

@zuluzazu Hello, I tried to set 6 clients per round, but get same bad result. Is there any other config need to change?

Aug 25 '22 05:08 Luoyang144

@zuluzazu Hello, will you solve this problem? It really made me confused.

Aug 26 '22 13:08 Luoyang144

Hi @Luoyang144 When I trained it was converging. Currently I am having department orientations. I will try to check this over the weekend.

Aug 26 '22 13:08 MrigankRaman

@Luoyang144 I am pretty certain that the convergence issue is most probably due to hyper parameters. So we in the meanwhile can you please do a hyper parameter tuning like maybe decreasing the learning rate and changing batch sizes. In my experience Federated settings are very sensitive to hyperparameters so it would be cool if you could do some hyper parameters tuning and I will also try to check for any bug over the weekend

Aug 26 '22 21:08 MrigankRaman

@zuluzazu Thank you, I will try to change some parameters.

Aug 27 '22 00:08 Luoyang144

@zuluzazu Hello, this weekend I tried to tune some hyper parameters, like lr, client_number, but all get bad result, even worse than original parameters.

Aug 29 '22 02:08 Luoyang144

@Luoyang144 Does your Rouge score improve if you do centralized training. If not then there is an issue with the model and not the FedNLP code in itself. Can you check?

Sep 30 '22 16:09 MrigankRaman

@Luoyang144 Were you able to resolve the issue?

Oct 25 '23 01:10 fedml-dimitris