About the parameters sensitivity
@graykode , I am very grateful for your source code! However, I have found that your implementation is very sensitive to the parameters of the network, such as :
-
In the batch_normalization layer, it must set the trainable=False, because when setting "trainable=True", the results will drop a lot.
-
In the conv2d layer, it must set the padding to vaild, and the results will also drop when setting the way of padding to "same".
So, I feel very strange and can't understand about this phenomenon, because I think the key of this algorithm is not the design of the network structure, it shouldn't be such sensitive to the parameters of the network, could you explain this phenomenon in detail? Thanks for your kindness!