AMDMIGraphX [Issue]: Investigate and Fix GPU error with int8 reduced layer models

[Issue]: Investigate and Fix GPU error with int8 reduced layer models

Open TedThemistokleous opened this issue 7 months ago • 2 comments

Problem Description

Seeing GPU fault when running the onnxruntime-inference-examples script using reduced layer bert models during benchmarking.

It appears quantization/calibration steps work and the issue arises during inference.

Collecting tensor data and making histogram ...
Finding optimal threshold for each tensor using percentile algorithm ...
Number of tensors : 66
Number of histogram bins : 2048
Percentile : (0.0010000000000047748,99.999)
Calibration is done. Calibration cache is saved to calibration.json
Int8 Quantization Done with Onnxruntime Quantizer
QDQ model is saved to  ./qdq_model.onnx
Running Inferences
Memory access fault by GPU node-1 (Agent handle: 0x5581f06a9ff0) on address 0x7f04bdc71000. Reason: Page not present or supervisor privilege.
Aborted (core dumped)
root@aus-navi3x-02:/works

This is blocking us getting customer data for int8 and int8 fp16 (mixed precision) results

Operating System

Ubuntu 22.04

CPU

Whatever CI is using

GPU

AMD Radeon RX 7900 XT

Other

No response

ROCm Version

ROCm 6.0.0

Steps to Reproduce

Run script from /workspace/onnxruntime-inference-examples/quantization/nlp/bert/migraphx using the --int8 flag

Able to see this across Navi31 and Navi32 cards

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

Jul 23 '24 15:07 TedThemistokleous

AMDMIGraphX AMDMIGraphX copied to clipboard

[Issue]: Investigate and Fix GPU error with int8 reduced layer models

Problem Description

Operating System

CPU

GPU

Other

ROCm Version

Steps to Reproduce

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

Additional Information

AMDMIGraphX
AMDMIGraphX copied to clipboard