[Issue]: Running Argmax op using MiGraphX EP (ONNX Runtime)
Problem Description
Getting a Core Dump trying to running Argmax operation using MigraphX EP in ONNX Runtime. Following is a short error trace:
[W:onnxruntime:Default, migraphx_execution_provider.cc:1298 compile_program] Model Compile: Begin
[W:onnxruntime:Default, migraphx_execution_provider.cc:1303 compile_program] Model Compile: Complete
Successfully created inference session using provider: MIGraphXExecutionProvider
Detected Input: Name='/Resize_4_output_0', Original Shape=[1, 6, 540, 960], Static Shape=[1, 6, 540, 960], Type=tensor(float)
Generating dummy input for '/Resize_4_output_0' with shape [1, 6, 540, 960] and type float32
Performing a warm-up run...
:0:/longer_pathname_so_that_rpms_can_support_packaging_the_debug_info_for_all_os_profiles/src/clr/hipamd/src/hip_global.cpp:158 : 2323738241245 us: Module not initialized
Aborted (core dumped)
Operating System
Ubuntu 22.04.5 LTS (Jammy Jellyfish)
CPU
AMD Ryzen 7 8700G
GPU
Other
Other
Radeon 780M Graphics
ROCm Version
ROCm 6.0.0
Steps to Reproduce
Create a minimal onnx graph with an argmax operation. The input to the argmax op used is of shape [1, C, H, W] and with the argmax done the C dimension. We have tried executing this graph with both the Python and C++ Onnx runtime api, with the same core dump.
Steps to recreate our setup
Enable BIOS settings for iGPU + VRAM
Install AMDGPU and ROCm 6.4.1 (For Ubuntu 22.04)
wget https://repo.radeon.com/amdgpu-install/6.4.1/ubuntu/jammy/amdgpu-install_6.4.60401-1_all.deb
sudo apt install ./amdgpu-install_6.4.60401-1_all.deb
sudo apt update
sudo apt install "linux-headers-$(uname -r)" "linux-modules-extra-$(uname -r)"
sudo apt install amdgpu-dkms
sudo apt install python3-setuptools python3-wheel
sudo usermod -a -G render,video $LOGNAME # Add the current user to the render and video groups
sudo apt install rocm
Installation guide (for latest version)
Setup ROCm paths
Post-installation instructions — ROCm installation (Linux)
Install MiGraphX (We need to compile the MiGraphx from source to get the support for gfx-1103)
clone the following repo: git clone https://github.com/ROCm/AMDMIGraphX.git
Checkout appropriate branch: git checkout rocm-6.4.1
Install some ubuntu dependencies:
sudo apt update && sudo apt install -y \
cmake \
libnuma-dev \
miopen-hip-dev \
openmp-extras \
python3-dev \
python3-pip \
python3-venv \
rocblas-dev \
libgfortran5 \
hipblas-dev \
hipblaslt-dev \
hipcc \
rocm-cmake \
rocm-llvm-dev \
libtbb-dev \
libssldev
Create a conda environment for the dependencies: (as dependency installer doesn’t work out of the box, we divided it below)
conda create -n migraphx_env python
conda activate migraphx_env
pip install numpy
pip3 install setuptools wheel
pip3 install https://github.com/RadeonOpenCompute/rbuild/archive/master.tar.gz
Install and build the rest of the dependencies:
rbuild prepare -d depend
Compile the package:
CXX=/opt/rocm/llvm/bin/clang++ cmake .. -DGPU_TARGETS=$(/opt/rocm/bin/rocminfo | grep -o -m1 'gfx.*') -DCMAKE_PREFIX_PATH=/home/nav_common/workdir/3rdparty/AMDMIGraphX/depend/ -DONNX_USE_PROTOBUF_SHARED_LIBS=ON
Onnxruntime
Clone the repo: git clone https://github.com/ROCm/onnxruntime.git
Checkout the appropriate branch: git checkout rocm6.4_internal_testing
Build cmake for > 3.26:
sudo apt remove --purge --auto-remove cmake
sudo apt update && sudo apt install build-essential libtool autoconf unzip wget
version=3.28
build=1
## don't modify from here
mkdir ~/temp
cd ~/temp
wget https://cmake.org/files/v$version/cmake-$version.$build.tar.gz
tar -xzvf cmake-$version.$build.tar.gz
cd cmake-$version.$build/
./bootstrap
make -j$(nproc)
sudo make install
cmake --version
Compile it: ./build.sh --config Release --parallel --build_wheel --build_shared_lib --use_migraphx --migraphx_home ../AMDMIGraphX/build/ --skip_tests
Additional package to install:
apt-get update && apt-get install -y libnuma1
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
ROCk module version 6.12.12 is loaded
=====================
HSA System Attributes
=====================
Runtime Version: 1.15
Runtime Ext Version: 1.7
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED
XNACK enabled: NO
DMAbuf Support: YES
VMM Support: YES
==========
HSA Agents
==========
*******
Agent 1
*******
Name: AMD Ryzen 7 8700G w/ Radeon 780M Graphics
Uuid: CPU-XX
Marketing Name: AMD Ryzen 7 8700G w/ Radeon 780M Graphics
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 4200
BDFID: 0
Internal Node ID: 0
Compute Unit: 16
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Memory Properties:
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 48960044(0x2eb122c) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 48960044(0x2eb122c) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 48960044(0x2eb122c) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 4
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 48960044(0x2eb122c) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
*******
Agent 2
*******
Name: gfx1103
Uuid: GPU-XX
Marketing Name: AMD Radeon Graphics
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 32(0x20) KB
L2: 2048(0x800) KB
Chip ID: 5567(0x15bf)
ASIC Revision: 12(0xc)
Cacheline Size: 128(0x80)
Max Clock Freq. (MHz): 2900
BDFID: 28672
Internal Node ID: 1
Compute Unit: 12
SIMDs per CU: 2
Shader Engines: 1
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Memory Properties: APU
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 40
SDMA engine uCode:: 21
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 24480020(0x1758914) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 24480020(0x1758914) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Recommended Granule:0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1103
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
ISA 2
Name: amdgcn-amd-amdhsa--gfx11-generic
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***
Additional Information
No response
Hi @smohta-all3. Internal ticket has been created to assist with your issue. Thanks!
Any updates on this?