FedScale
FedScale copied to clipboard
Fix async
Why are these changes needed?
- Model testing is somehow missing; 2. Weird model accuracy over training;
Related issue number
Checks
- [ ] I've included any doc changes needed for https://fedscale.readthedocs.io/en/latest/
- [ ] I've made sure the following tests are passing.
- Testing Configurations
- [ ] Dry Run (20 training rounds & 1 evaluation round)
- [ ] Cifar 10 (20 training rounds & 1 evaluation round)
- [ ] Femnist (20 training rounds & 1 evaluation round)
Thank you for the fix!
Thank you for the fix!
Thanks a lot for your feedback! This PR is still WIP, since we notice the weird accuracy issue has not been well addressed (although I almost rewrote the entire async). I will fix it and your comments asap. Thanks for your patience!
Hi @AmberLJC do you have any tensorboard results for async handy? It would be helpful to attach it with the PR. Thanks!
Hi @AmberLJC do you have any tensorboard results for async handy? It would be helpful to attach it with the PR. Thanks!
data:image/s3,"s3://crabby-images/5cec5/5cec56f0cea9c9b11b5f207c1afaf3f693182145" alt="image"
Do you know how the convergence over virtual clock time compares with synchronous training with similar parameters?
Do you know how the convergence over virtual clock time compares with synchronous training with similar parameters?
Let me try it out. I previously had some results, but I want to make a fair comparison again.
@AmberLJC Can you please help to review and test the last commit? Training gets stuck as the model (version) is missing on executors. Thanks!