Deep-Residual-Network-For-MXNet
Deep-Residual-Network-For-MXNet copied to clipboard
is the bias term in the conv layers supposed to be disabled as the original paper?
I saw in the original model, they disable the bias in the conv layers and add a bias in the scale layers. Since in mxnet, the batch norm layers have both the scale and the bias, I am wondering if it would make a difference without disabling the bias term.
thanks for your point of view
I have no idea about what is "scale layers" mentioned as @horserma , may you explain it or recommend some material? Thanks!
@LaoAnchor In the original Residual network paper, they use a batch normalization layer followed by a scale layer. Different from the original "batch normalization" paper, their norm layer only subtracts the mean and divides the variance, whereas the standard batch norm layer should have a scale and a bias term after the normalization. In the residual network paper, they use two layers, batch norm layer and scale layer, to work as a standard norm layer. Hope it helpful.
@horserma Thanks, i got that!