AMDMIGraphX
AMDMIGraphX copied to clipboard
[Issue]: Investigate and Fix GPU error with int8 reduced layer models
Problem Description
Seeing GPU fault when running the onnxruntime-inference-examples script using reduced layer bert models during benchmarking.
It appears quantization/calibration steps work and the issue arises during inference.
Collecting tensor data and making histogram ...
Finding optimal threshold for each tensor using percentile algorithm ...
Number of tensors : 66
Number of histogram bins : 2048
Percentile : (0.0010000000000047748,99.999)
Calibration is done. Calibration cache is saved to calibration.json
Int8 Quantization Done with Onnxruntime Quantizer
QDQ model is saved to ./qdq_model.onnx
Running Inferences
Memory access fault by GPU node-1 (Agent handle: 0x5581f06a9ff0) on address 0x7f04bdc71000. Reason: Page not present or supervisor privilege.
Aborted (core dumped)
root@aus-navi3x-02:/works
This is blocking us getting customer data for int8 and int8 fp16 (mixed precision) results
Operating System
Ubuntu 22.04
CPU
Whatever CI is using
GPU
AMD Radeon RX 7900 XT
Other
No response
ROCm Version
ROCm 6.0.0
Steps to Reproduce
Run script from /workspace/onnxruntime-inference-examples/quantization/nlp/bert/migraphx using the --int8 flag
Able to see this across Navi31 and Navi32 cards
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response