GradientAccumulator
GradientAccumulator copied to clipboard
Replacing BN layer with AccumBN layer results in poorer convergence
Describe the bug
In the latest release of gradient-accumulator==0.5.2
, there was added a method to add accum support to existing BN layers.
However, when attempting to use it in production, models seems to struggle to converge. We should benchmark this layer to verify that it is actually working as expected, and perhaps add units tests that capture if the approximation is too poor for production use before merging PR to the main branch.
Expected behavior
Swapping the BN layer with AccumBN should be seemless, transfer old weights to new layer, and yield better convergence than regular BN for accum_steps > 1
(in general).
Desktop (please complete the following information):
- OS: [e.g. Ubuntu] Ubuntu
- Version: [e.g. 20.04] 20.04
- Python: [3.9] 3.8.10
- TensorFlow: [2.8.0] 2.11.0