tch-rs
tch-rs copied to clipboard
Fix BatchNorm scaling and bias behavior
-
Scaling factor is initialized to 1 instead of [0, 1] uniform I discovered an issue that the running_var and scaling factor converge to very small values. Here is an example screenshot showing that the running_var decreases to be very small and constant values. It could be due to ill-initialized scaling factor by [0, 1] uniform. I follow the PyTorch impl (code) to initialize to constant 1 instead.
-
Affine transformation becomes optional In PyTorch you can optionally disable affine by setting
affine=False. The PR makes thews_initandbs_initoptional.