caffe2 icon indicating copy to clipboard operation
caffe2 copied to clipboard

caffe2 conv - cudnnAddTensor for bias generates wrong output! (HALF, CUDNN6.0, in special cases)

Open kangdongh opened this issue 7 years ago • 6 comments

cudnnAddTensor generates an error on output blob :(

specification: in my case, float16 input tensor (NCHW) = (8, 256, 28, 28) output tensor (NCHW) = (8, 512, 28, 28) 1x1 kernel (it almost occurs on 1x1 kernels)

batch N = 1~3 : it works well, but it produces wrong output when N>=4

tested on Jetson TX2 and server with GTX1080Ti

it may due to CuDNN but I write this issue for notifying

kangdongh avatar Oct 17 '17 18:10 kangdongh

Might be a downstream problem? cc @slayton58 if you guys have a handy TX2/1080Ti on hand to check :)

Yangqing avatar Oct 24 '17 02:10 Yangqing

Ok, to clarify - the call to cudnnAddTensor is coming from the bias addition in a convolution operator?

In which case you actually want (all in NCHW): (8, 512, 28, 28) + (1, 512, 1, 1) in fp16?

slayton58 avatar Oct 24 '17 13:10 slayton58

I solved this problem with my own CUDA implementation for adding bias and it works well :). (but little bit slower than CuDNN)

kangdongh avatar Oct 25 '17 09:10 kangdongh

@slayton58 Do you recall any changes landing recently that could have solved this?

pietern avatar Dec 12 '17 05:12 pietern

@pietern Still waiting on more information from @kangdongh otherwise I can't comment

slayton58 avatar Dec 12 '17 14:12 slayton58

@slayton58 sorry for late response :(. Actually, I'm not friendly with using github issues First, I found that a huge error in result was derived in different batch sizes (1 vs 8) when I tried to test VGG + tucker decomposition + FP16 with bigger batch size for a small contest. Then, I had tried debugging to pinpoint where the problem came and I found that the error in result came from certain convolution operators which have bias term.

However, I changed cudnnAddTensor(~) in cudnn conv operator with my own CUDA code to solve this problem. The issue is still remaining; to clarify:

  1. FP16,
  2. batch size is bigger than certain constant value.
  3. use cudnn conv operator, specifically with bias term (then it uses cudnnaddtensor)
  4. it was corrected when I changed the caffe2 conv code , conv_op_cudnn.cc, 610 ~ 618 line with other implementation.

kangdongh avatar Dec 13 '17 03:12 kangdongh