[Bug]: OpenVINO as backend to Pytorch - integrated GPU works but discreet GPU gives NANs
OpenVINO Version
2024.2.0
Operating System
Other (Please specify in description)
Device used for inference
GPU
Framework
PyTorch
Model used
No response
Issue description
Using OpenVINO 2024.2.0 as backend to Pytorch (intel_extension_for_pytorch 2.1.30.post0) Python 3.10 within conda forge enviornment on Ubuntu 24.04 oneAPI 2024.1 12th Gen NUC with 12th Gen Intel(R) Core(TM) i7-12700H CPU and A770m GPU
Hello query device output from the env:
(py310) user@NUC12SNKi72:~$ python3 /usr/share/openvino/samples/python/hello_query_device/hello_query_device.py
[ INFO ] Available devices:
[ INFO ] CPU :
[ INFO ] SUPPORTED_PROPERTIES:
[ INFO ] AVAILABLE_DEVICES:
[ INFO ] RANGE_FOR_ASYNC_INFER_REQUESTS: 1, 1, 1
[ INFO ] RANGE_FOR_STREAMS: 1, 20
[ INFO ] EXECUTION_DEVICES: CPU
[ INFO ] FULL_DEVICE_NAME: 12th Gen Intel(R) Core(TM) i7-12700H
[ INFO ] OPTIMIZATION_CAPABILITIES: FP32, INT8, BIN, EXPORT_IMPORT
[ INFO ] DEVICE_TYPE: Type.INTEGRATED
[ INFO ] DEVICE_ARCHITECTURE: intel64
[ INFO ] NUM_STREAMS: 1
[ INFO ] INFERENCE_NUM_THREADS: 0
[ INFO ] PERF_COUNT: False
[ INFO ] INFERENCE_PRECISION_HINT: <Type: 'float32'>
[ INFO ] PERFORMANCE_HINT: PerformanceMode.LATENCY
[ INFO ] EXECUTION_MODE_HINT: ExecutionMode.PERFORMANCE
[ INFO ] PERFORMANCE_HINT_NUM_REQUESTS: 0
[ INFO ] ENABLE_CPU_PINNING: True
[ INFO ] SCHEDULING_CORE_TYPE: SchedulingCoreType.ANY_CORE
[ INFO ] MODEL_DISTRIBUTION_POLICY: set()
[ INFO ] ENABLE_HYPER_THREADING: True
[ INFO ] DEVICE_ID:
[ INFO ] CPU_DENORMALS_OPTIMIZATION: False
[ INFO ] LOG_LEVEL: Level.NO
[ INFO ] CPU_SPARSE_WEIGHTS_DECOMPRESSION_RATE: 1.0
[ INFO ] DYNAMIC_QUANTIZATION_GROUP_SIZE: 0
[ INFO ] KV_CACHE_PRECISION: <Type: 'float16'>
[ INFO ] AFFINITY: Affinity.HYBRID_AWARE
[ INFO ]
[ INFO ] GPU.0 :
[ INFO ] SUPPORTED_PROPERTIES:
[ INFO ] AVAILABLE_DEVICES: 0, 1
[ INFO ] RANGE_FOR_ASYNC_INFER_REQUESTS: 1, 2, 1
[ INFO ] RANGE_FOR_STREAMS: 1, 2
[ INFO ] OPTIMAL_BATCH_SIZE: 1
[ INFO ] MAX_BATCH_SIZE: 1
[ INFO ] DEVICE_ARCHITECTURE: GPU: vendor=0x8086 arch=v12.3.0
[ INFO ] FULL_DEVICE_NAME: Intel(R) Iris(R) Xe Graphics (iGPU)
[ INFO ] DEVICE_UUID: 8680a6460c0000000002000000000000
[ INFO ] DEVICE_LUID: 0200000000000000
[ INFO ] DEVICE_TYPE: Type.INTEGRATED
[ INFO ] DEVICE_GOPS: {<Type: 'float16'>: 4300.7998046875, <Type: 'float32'>: 2150.39990234375, <Type: 'int8_t'>: 8601.599609375, <Type: 'uint8_t'>: 8601.599609375}
[ INFO ] OPTIMIZATION_CAPABILITIES: FP32, BIN, FP16, INT8, EXPORT_IMPORT
[ INFO ] GPU_DEVICE_TOTAL_MEM_SIZE: 14863626240
[ INFO ] GPU_UARCH_VERSION: 12.3.0
[ INFO ] GPU_EXECUTION_UNITS_COUNT: 96
[ INFO ] GPU_MEMORY_STATISTICS: {}
[ INFO ] PERF_COUNT: False
[ INFO ] MODEL_PRIORITY: Priority.MEDIUM
[ INFO ] GPU_HOST_TASK_PRIORITY: Priority.MEDIUM
[ INFO ] GPU_QUEUE_PRIORITY: Priority.MEDIUM
[ INFO ] GPU_QUEUE_THROTTLE: Priority.MEDIUM
[ INFO ] GPU_ENABLE_SDPA_OPTIMIZATION: True
[ INFO ] GPU_ENABLE_LOOP_UNROLLING: True
[ INFO ] GPU_DISABLE_WINOGRAD_CONVOLUTION: False
[ INFO ] CACHE_DIR:
[ INFO ] CACHE_MODE: CacheMode.OPTIMIZE_SPEED
[ INFO ] PERFORMANCE_HINT: PerformanceMode.LATENCY
[ INFO ] EXECUTION_MODE_HINT: ExecutionMode.PERFORMANCE
[ INFO ] COMPILATION_NUM_THREADS: 20
[ INFO ] NUM_STREAMS: 1
[ INFO ] PERFORMANCE_HINT_NUM_REQUESTS: 0
[ INFO ] INFERENCE_PRECISION_HINT: <Type: 'float16'>
[ INFO ] ENABLE_CPU_PINNING: False
[ INFO ] DEVICE_ID: 0
[ INFO ]
[ INFO ] GPU.1 :
[ INFO ] SUPPORTED_PROPERTIES:
[ INFO ] AVAILABLE_DEVICES: 0, 1
[ INFO ] RANGE_FOR_ASYNC_INFER_REQUESTS: 1, 2, 1
[ INFO ] RANGE_FOR_STREAMS: 1, 2
[ INFO ] OPTIMAL_BATCH_SIZE: 1
[ INFO ] MAX_BATCH_SIZE: 1
[ INFO ] DEVICE_ARCHITECTURE: GPU: vendor=0x8086 arch=v12.55.8
[ INFO ] FULL_DEVICE_NAME: Intel(R) Arc(TM) A770M Graphics (dGPU)
[ INFO ] DEVICE_UUID: 86809056080000000300000000000000
[ INFO ] DEVICE_LUID: 0200000000000000
[ INFO ] DEVICE_TYPE: Type.DISCRETE
[ INFO ] DEVICE_GOPS: {<Type: 'float16'>: 0.0, <Type: 'float32'>: 16793.599609375, <Type: 'int8_t'>: 0.0, <Type: 'uint8_t'>: 0.0}
[ INFO ] OPTIMIZATION_CAPABILITIES: FP32, BIN, FP16, INT8, GPU_HW_MATMUL, EXPORT_IMPORT
[ INFO ] GPU_DEVICE_TOTAL_MEM_SIZE: 16225243136
[ INFO ] GPU_UARCH_VERSION: 12.55.8
[ INFO ] GPU_EXECUTION_UNITS_COUNT: 512
[ INFO ] GPU_MEMORY_STATISTICS: {}
[ INFO ] PERF_COUNT: False
[ INFO ] MODEL_PRIORITY: Priority.MEDIUM
[ INFO ] GPU_HOST_TASK_PRIORITY: Priority.MEDIUM
[ INFO ] GPU_QUEUE_PRIORITY: Priority.MEDIUM
[ INFO ] GPU_QUEUE_THROTTLE: Priority.MEDIUM
[ INFO ] GPU_ENABLE_SDPA_OPTIMIZATION: True
[ INFO ] GPU_ENABLE_LOOP_UNROLLING: True
[ INFO ] GPU_DISABLE_WINOGRAD_CONVOLUTION: False
[ INFO ] CACHE_DIR:
[ INFO ] CACHE_MODE: CacheMode.OPTIMIZE_SPEED
[ INFO ] PERFORMANCE_HINT: PerformanceMode.LATENCY
[ INFO ] EXECUTION_MODE_HINT: ExecutionMode.PERFORMANCE
[ INFO ] COMPILATION_NUM_THREADS: 20
[ INFO ] NUM_STREAMS: 1
[ INFO ] PERFORMANCE_HINT_NUM_REQUESTS: 0
[ INFO ] INFERENCE_PRECISION_HINT: <Type: 'float16'>
[ INFO ] ENABLE_CPU_PINNING: False
[ INFO ] DEVICE_ID: 1
[ INFO ]
(py310) user@NUC12SNKi72:~$
Step-by-step reproduction
In code below, if you change torch.compile() line from GPU.1 to GPU.0, real numbers are printed in prediction. If GPU.1 is used, NANs are printed.
(py310) user@NUC12SNKi72:~$ cat test.py
import torch
import intel_extension_for_pytorch as ipex
import torchvision.models as models
import openvino.torch
model = models.resnet50(weights="ResNet50_Weights.DEFAULT")
model.eval()
data = torch.rand(1, 3, 224, 224)
model = torch.compile(model, backend="openvino", options = {"device" : "GPU.1", "model_caching" : True, "cache_dir": "./model_cache"})
#model = torch.compile(model, backend="openvino", options = {"device" : "CPU"})\n')
model = model.to("xpu")
data = data.to("xpu")
data = torch.rand((1,3,224,224))
print("Input data shape: ", data.shape)
dtype=torch.bfloat16
data=data.to('xpu')
pred=model(data)
print("Prediction: ", pred)
(py310) user@NUC12SNKi72:~$
Relevant log output
No response
Issue submission checklist
- [X] I'm reporting an issue. It's not a question.
- [X] I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
- [X] There is reproducer code and related data files such as images, videos, models, etc.
Any updates?
This issue will be closed in a week because of 9 months of no activity.
This issue was closed because it has been stalled for 9 months with no activity.