宋全旺

Results 2 comments of 宋全旺

I think you are right.Because padding is for each convolution output only depending on the input before it.for example kernel_size=2,dilation=4,padding should be 4.

I think he mean that when y or y_hat equal 0, use the loss(nonzero_loss) is not reasonable,so he give a comparable big loss 1 .