oneflow icon indicating copy to clipboard operation
oneflow copied to clipboard

oneflow.nn.Fold may produce inconsistent results when runned w/ and w/o requires_grad_(True)

Open xxxyyyzzz12345 opened this issue 1 year ago • 2 comments

Summary

oneflow.nn.Fold may produce inconsistent results when runned w/ and w/o requires_grad_(True).

The following code snippet produces count != 0

count = 0
input = oneflow.rand(2, 72, 16,dtype=oneflow.float64).cuda()
input_grad = input.clone().requires_grad_(True)
for i in range(100):
    mod = oneflow.nn.Fold(dilation= 1, kernel_size= 3, output_size=[8, 8], padding=1, stride=2)
    output = oneflow.sum(mod(input))
    output_grad = oneflow.sum(mod(input_grad))
    if not output==output_grad:
        count += 1
print(count)

System Information

  • What is your OneFlow installation (pip, source, dockerhub): pip
  • OS: Ubuntu 20.04.2 LTS
  • OneFlow version (run python3 -m oneflow --doctor): 0.7.0+cu112
  • Python version: 3.8.8
  • CUDA driver version: 11.4

xxxyyyzzz12345 avatar Jul 27 '22 06:07 xxxyyyzzz12345

image

image

I just follow your script, and get different result of count. It seems like a Floating point error by accident, I think it is not a Bug? Or you can provide me a data to reproduce.

Since you compare the sum of float number, different operation order can cause different result. And I will recommend you to use np.allclose to compare the result, it can set a appropriate tolerance like this:

np.allclose(output.numpy(), output_grad.numpy(), atol=1e-4, rtol=1e-4)

MARD1NO avatar Jul 27 '22 06:07 MARD1NO

Thanks for your reply! However, the results are still inconsistent if oneflow.sum() is not used.

count = 0
input = oneflow.rand(2, 72, 16,dtype=oneflow.float64).cuda()
input_grad = input.clone().requires_grad_(True)
for i in range(1):
    mod = oneflow.nn.Fold(dilation= 1, kernel_size= 3, output_size=[8, 8], padding=1, stride=2)
    output = mod(input)
    output_grad = mod(input_grad)
    #print(output==output_grad)
    if not (output==output_grad).all():
        print(output-output_grad)
        count += 1
print(count)

The difference between output and output_grad is printed: 图片

xxxyyyzzz12345 avatar Jul 27 '22 08:07 xxxyyyzzz12345