GradientAccumulator icon indicating copy to clipboard operation
GradientAccumulator copied to clipboard

Replacing BN layer with AccumBN layer results in poorer convergence

Open andreped opened this issue 9 months ago • 0 comments

Describe the bug In the latest release of gradient-accumulator==0.5.2, there was added a method to add accum support to existing BN layers.

However, when attempting to use it in production, models seems to struggle to converge. We should benchmark this layer to verify that it is actually working as expected, and perhaps add units tests that capture if the approximation is too poor for production use before merging PR to the main branch.

Expected behavior Swapping the BN layer with AccumBN should be seemless, transfer old weights to new layer, and yield better convergence than regular BN for accum_steps > 1 (in general).

Desktop (please complete the following information):

  • OS: [e.g. Ubuntu] Ubuntu
  • Version: [e.g. 20.04] 20.04
  • Python: [3.9] 3.8.10
  • TensorFlow: [2.8.0] 2.11.0

andreped avatar Sep 11 '23 09:09 andreped