djl
djl copied to clipboard
BatchNorm beta diminishing parameter values.
@stu1130
I am running the BatchNorm section, and I made gamma and beta as the two parameters in the section. Although the results turn out to be fine, the beta is turning out with really low values.
This is the book's result:
And this is mine:
I can reproduce the issue and found both PyTorch and MXNet have the same problem. This could be caused by gradient vanishing. Only the first batchNorm have the issue. need to dive deeper
Reproducible use general batchnorm implementation. Investigating now
@lanking520 do we have any update?