GradientAccumulator Replacing BN layer with AccumBN layer results in poorer convergence

Replacing BN layer with AccumBN layer results in poorer convergence

Open andreped opened this issue 1 year ago • 0 comments

Describe the bug In the latest release of gradient-accumulator==0.5.2, there was added a method to add accum support to existing BN layers.

However, when attempting to use it in production, models seems to struggle to converge. We should benchmark this layer to verify that it is actually working as expected, and perhaps add units tests that capture if the approximation is too poor for production use before merging PR to the main branch.

Expected behavior Swapping the BN layer with AccumBN should be seemless, transfer old weights to new layer, and yield better convergence than regular BN for accum_steps > 1 (in general).

Desktop (please complete the following information):

OS: [e.g. Ubuntu] Ubuntu
Version: [e.g. 20.04] 20.04
Python: [3.9] 3.8.10
TensorFlow: [2.8.0] 2.11.0

Sep 11 '23 09:09 andreped

GradientAccumulator GradientAccumulator copied to clipboard

Replacing BN layer with AccumBN layer results in poorer convergence

GradientAccumulator
GradientAccumulator copied to clipboard