CLIP
CLIP copied to clipboard
Output differs on same input
Hello, thanks for releasing the model!
I am observing different outputs on the same input (only between the first run and the second one, the subsequent ones agree with the second oe). The following code reproduces the problem.
import torch
print(f"Torch version: {torch.__version__}")
model = torch.jit.load("model.pt").cuda().eval()
torch.manual_seed(0)
x = torch.randn((1, 3, 224, 224), dtype=torch.float32).to("cuda")
with torch.no_grad():
image_features_1 = model.encode_image(x).float()
image_features_2 = model.encode_image(x).float()
image_features_3 = model.encode_image(x).float()
print(torch.max(torch.abs(image_features_1 - image_features_2)))
print(torch.max(torch.abs(image_features_3 - image_features_2)))
The output:
Torch version: 1.7.1
tensor(0.0039, device='cuda:0')
tensor(0., device='cuda:0')
We btw checked the model buffers and parameters and they do not change in-between the calls.
Hi, thanks for reporting this issue.
I suspect it's due to CUDA's nondeterministic behavior, as it doesn't happen in the CPU mode.
Slight numerical imprecision is sometimes inevitable, especially when dealing with half-precision models. A workaround is to "warm up" the model as you did, and after a third call or so it'll more likely emit deterministic outputs.
Thanks for the fast reply! I observe the same behavior even with torch.set_deterministic(True)
at the top and CUBLAS_WORKSPACE_CONFIG=:4096:8
.
Thanks for more clues; I observed that too. I don't fully grasp what is happening under the hood, but will keep this issue open in case there's a fix possible, potentially in a future PyTorch version.
By the way, the problem is much worse under pytorch 1.7.0 (the maximum absolute difference goes up to 2.7236), and quite interestingly, when I use it for zero-shot classification only the first output (image_features_1
) translates to sensible results. I suggest putting an assert on the version number.
Yeah that was why I needed to require 1.7.1. Putting an assert is a good idea, thanks!
Btw, the above problem goes away on 1.7.0 with jit=True
in clip.load
.
My observation is that on pytorch 1.7.0 the after-warmup runs produce wrong results.
For example, the sample in README starts returning roughly equal score ~30% for each caption on second run and after:
Label probs: [[0.311 0.331 0.358]]
pytorch 1.7.1 solves the problem
I've observed the same problem. Here two Colab notebooks if somebody wants to test:
- 1.7.0: https://colab.research.google.com/drive/1KcHMeI2N-FFthMEYip3A1ZNbccdRK4XR?usp=sharing
- 1.7.1: https://colab.research.google.com/drive/1h3pyWaZ0gA4DYWoip49W1skBmkxZ_gDC?usp=sharing
With 1.7.0 the results on the second run are very far off.