FedScale Fix async

Why are these changes needed?

Model testing is somehow missing; 2. Weird model accuracy over training;

Related issue number

Checks

[ ] I've included any doc changes needed for https://fedscale.readthedocs.io/en/latest/
[ ] I've made sure the following tests are passing.
Testing Configurations
- [ ] Dry Run (20 training rounds & 1 evaluation round)
- [ ] Cifar 10 (20 training rounds & 1 evaluation round)
- [ ] Femnist (20 training rounds & 1 evaluation round)

Aug 12 '22 07:08 fanlai0990

Thank you for the fix!

Aug 12 '22 19:08 ewenw

Thank you for the fix!

Thanks a lot for your feedback! This PR is still WIP, since we notice the weird accuracy issue has not been well addressed (although I almost rewrote the entire async). I will fix it and your comments asap. Thanks for your patience!

Aug 15 '22 16:08 fanlai0990

Hi @AmberLJC do you have any tensorboard results for async handy? It would be helpful to attach it with the PR. Thanks!

Aug 24 '22 17:08 ewenw

Hi @AmberLJC do you have any tensorboard results for async handy? It would be helpful to attach it with the PR. Thanks!

Aug 24 '22 18:08 AmberLJC

Do you know how the convergence over virtual clock time compares with synchronous training with similar parameters?

Aug 24 '22 20:08 ewenw

Do you know how the convergence over virtual clock time compares with synchronous training with similar parameters?

Let me try it out. I previously had some results, but I want to make a fair comparison again.

Aug 24 '22 20:08 AmberLJC

@AmberLJC Can you please help to review and test the last commit? Training gets stuck as the model (version) is missing on executors. Thanks!

Sep 02 '22 03:09 fanlai0990

FedScale FedScale copied to clipboard

Fix async

Why are these changes needed?

Related issue number

Checks

FedScale
FedScale copied to clipboard