CLIP Output differs on same input

Output differs on same input

Open josipd opened this issue 3 years ago • 8 comments

Hello, thanks for releasing the model!

I am observing different outputs on the same input (only between the first run and the second one, the subsequent ones agree with the second oe). The following code reproduces the problem.

import torch                                                                    
                                                                                
print(f"Torch version: {torch.__version__}")

model = torch.jit.load("model.pt").cuda().eval()
torch.manual_seed(0)
x = torch.randn((1, 3, 224, 224), dtype=torch.float32).to("cuda")               

with torch.no_grad():
  image_features_1 = model.encode_image(x).float()
  image_features_2 = model.encode_image(x).float()
  image_features_3 = model.encode_image(x).float()

print(torch.max(torch.abs(image_features_1 - image_features_2)))               
print(torch.max(torch.abs(image_features_3 - image_features_2)))

The output:

Torch version: 1.7.1
tensor(0.0039, device='cuda:0')
tensor(0., device='cuda:0')

We btw checked the model buffers and parameters and they do not change in-between the calls.

Jan 13 '21 16:01 josipd

Hi, thanks for reporting this issue.

I suspect it's due to CUDA's nondeterministic behavior, as it doesn't happen in the CPU mode.

Slight numerical imprecision is sometimes inevitable, especially when dealing with half-precision models. A workaround is to "warm up" the model as you did, and after a third call or so it'll more likely emit deterministic outputs.

Jan 13 '21 18:01 jongwook

Thanks for the fast reply! I observe the same behavior even with torch.set_deterministic(True) at the top and CUBLAS_WORKSPACE_CONFIG=:4096:8.

Jan 13 '21 19:01 josipd

Thanks for more clues; I observed that too. I don't fully grasp what is happening under the hood, but will keep this issue open in case there's a fix possible, potentially in a future PyTorch version.

Jan 13 '21 19:01 jongwook

By the way, the problem is much worse under pytorch 1.7.0 (the maximum absolute difference goes up to 2.7236), and quite interestingly, when I use it for zero-shot classification only the first output (image_features_1) translates to sensible results. I suggest putting an assert on the version number.

Jan 14 '21 16:01 josipd

Yeah that was why I needed to require 1.7.1. Putting an assert is a good idea, thanks!

Jan 14 '21 18:01 jongwook

Btw, the above problem goes away on 1.7.0 with jit=True in clip.load.

Jan 14 '21 22:01 josipd

My observation is that on pytorch 1.7.0 the after-warmup runs produce wrong results.

For example, the sample in README starts returning roughly equal score ~30% for each caption on second run and after:

Label probs: [[0.311 0.331 0.358]]

pytorch 1.7.1 solves the problem

Jan 22 '21 19:01 dimitry12

I've observed the same problem. Here two Colab notebooks if somebody wants to test:

1.7.0: https://colab.research.google.com/drive/1KcHMeI2N-FFthMEYip3A1ZNbccdRK4XR?usp=sharing
1.7.1: https://colab.research.google.com/drive/1h3pyWaZ0gA4DYWoip49W1skBmkxZ_gDC?usp=sharing

With 1.7.0 the results on the second run are very far off.

Feb 01 '21 20:02 haltakov

CLIP CLIP copied to clipboard

Output differs on same input

CLIP
CLIP copied to clipboard