Fix fedprox MNIST tutorial for pytorch
Summary
Models trained with different mu values in the FedProx optimizer appear to have identical weights, despite the expectation that different mu values should produce different models. This suggests that the proximal term in the FedProx algorithm is not being applied correctly.
Type of Change (Mandatory)
Specify the type of change being made.
- Bug fix : https://github.com/securefederatedai/openfl/issues/823
Description (Mandatory)
1. Incorrect Timing of the set_old_weights Call
In the current notebook implementation, set_old_weights is called after gradient computation but right before the optimizer step. This is problematic because:
- It sets the reference weights (
w_old) to be nearly identical to the current weights - When the optimizer step is executed, the proximal term
mu * (param - w_old_param)becomes effectively zero - This nullifies the effect of different mu values, resulting in identical training outcomes
2. Missing Initial w_old Value in Optimizer
The optimizers (FedProxOptimizer and FedProxAdam) don't initialize the w_old parameter in their constructors. If step() is called before set_old_weights, there may be an error accessing the uninitialized w_old
.
Testing
- Tested locally.
Thanks @nisha987 This is looking good. I had one more minor comment regarding mu
Can you also rebase and signoff on your commits? https://github.com/securefederatedai/openfl/pull/1634/checks?check_run_id=42564995654
Thanks @nisha987 This is looking good. I had one more minor comment regarding
muCan you also rebase and signoff on your commits? https://github.com/securefederatedai/openfl/pull/1634/checks?check_run_id=42564995654
added signoff to all commits.