openvino
openvino copied to clipboard
[Bug]: Pytorch model converted to openvino returns zeros on Intel GPU
OpenVINO Version
2024.0.0-14473-3238290df5e
Operating System
Other (Please specify in description)
Device used for inference
GPU
Framework
PyTorch
Model used
No response
Issue description
I created a custom model to reproduce the bug, in that model I am getting tensor of shape (1, 4096, 4096) reshaping it to the tensor of (1, 64, 64, 64, 64) tensor, running some other functionality on that tensor and again reshaping it to (1, 4096, 4096), and because of that it returns zeros as an output, but If I comment that reshaping lines everything works well
class CustomModel(torch.nn.Module):
def __init__(self):
super().__init__()
def buggy_method(self, mask):
mask = mask.view(mask.shape[0], 64, 64, 64, 64)
# running some calculations
mask = mask.view(mask.shape[0], 64 * 64, 64 * 64)
return mask
# x has shape (1, 4096, 4096)
def forward(self, x):
mask = (x > 0.5).float()
# If I comment this function everything works well
mask = self.buggy_method(mask)
mask = mask \
* (x == x.max(dim=2, keepdim=True)[0]) \
* (x == x.max(dim=1, keepdim=True)[0])
a, _ = mask.max(dim=2)
b = a[0] * torch.arange(0, a.shape[1])
return a, b
This is my custom model and here a
is returning valid values but b
is returning all zeros on GPU but works well on CPU, but if I comment the line with self.buggy_method
everything will work well
Step-by-step reproduction
This is my script for bug reproduction. Also you can find the script and input tensor in this google drive
import torch
import openvino as ov
import numpy as np
class CustomModel(torch.nn.Module):
def __init__(self):
super().__init__()
def buggy_method(self, mask):
mask = mask.view(mask.shape[0], 64, 64, 64, 64)
# running some calculations
mask = mask.view(mask.shape[0], 64 * 64, 64 * 64)
return mask
# x has shape (1, 4096, 4096)
def forward(self, x):
mask = (x > 0.5).float()
# If I comment this function everything works well
mask = self.buggy_method(mask)
mask = mask \
* (x == x.max(dim=2, keepdim=True)[0]) \
* (x == x.max(dim=1, keepdim=True)[0])
a, _ = mask.max(dim=2)
b = a[0] * torch.arange(0, a.shape[1])
return a, b
if __name__ == "__main__":
model = CustomModel()
model = model.to("cpu")
model = model.eval()
dummy_input = torch.load("conf_matrix.pt")
with torch.no_grad():
_ = model(dummy_input)
core = ov.Core()
ov_model = ov.convert_model(model, example_input=dummy_input)
compiled_model_cpu = core.compile_model(model=ov_model, device_name="CPU", config={"INFERENCE_PRECISION_HINT": ov.Type.f32})
compiled_model_gpu = core.compile_model(model=ov_model, device_name="GPU.0", config={"INFERENCE_PRECISION_HINT": ov.Type.f32})
openvino_dummy_input = dummy_input.numpy()
output_cpu = compiled_model_cpu(openvino_dummy_input)
output_gpu = compiled_model_gpu(openvino_dummy_input)
print(output_cpu[0])
print(output_gpu[0])
print(np.all(output_cpu[0] - output_gpu[0] < 1e-5))
print(output_cpu[1])
print(output_gpu[1])
print(output_gpu[1].max())
print(np.all(output_cpu[1] - output_gpu[1] < 1e-5))
Here if I run this code, I am getting this output
[[1. 1. 1. ... 1. 1. 1.]] [[1. 1. 1. ... 1. 1. 1.]] True [0.000e+00 1.000e+00 2.000e+00 ... 4.093e+03 4.094e+03 4.095e+03] [0. 0. 0. ... 0. 0. 0.] 0.0 False
And we see here that the return value of a
on cpu and gpu are the same, but the return value of b
on gpu are all zeros.
If I comment the line with self.buggy_method
I will get this output
[[1. 1. 1. ... 1. 1. 1.]] [[1. 1. 1. ... 1. 1. 1.]] True [0.000e+00 1.000e+00 2.000e+00 ... 4.093e+03 4.094e+03 4.095e+03] [0.000e+00 1.000e+00 2.000e+00 ... 4.093e+03 4.094e+03 4.095e+03] 4095.0 True
And everything works well
Relevant log output
[[1. 1. 1. ... 1. 1. 1.]]
[[1. 1. 1. ... 1. 1. 1.]]
True
[0.000e+00 1.000e+00 2.000e+00 ... 4.093e+03 4.094e+03 4.095e+03]
[0. 0. 0. ... 0. 0. 0.]
0.0
False
Issue submission checklist
- [X] I'm reporting an issue. It's not a question.
- [X] I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
- [X] There is reproducer code and related data files such as images, videos, models, etc.
I've run the main.py script with and without Line 24 self.buggy_method
and I encountered the same issue as you did.
Comment Line 24:
Uncomment Line 24:
We'll investigate the issue and update you as soon as possible.
Ref. 140654