[Feature]: Add support for AMD iGPU to enable multi-GPU usage with dGPU + iGPU in PyTorch
Is your feature request related to a problem? Please describe.
On a system with ROCm 6.4.1 and PyTorch 2.5.1, I have both an iGPU and a dGPU available:
- GPU[0]: Radeon RX 7900 XTX (Device ID: 0x744c, recognized as cuda:0)
- GPU[1]: AMD Radeon Graphics iGPU on Ryzen 9 9900X (Device ID: 0x13c0, recognized as cuda:1)
My goal is to use both GPUs together (external + integrated) for PyTorch computations. When running a matrix multiplication on the dGPU (cuda:0), everything works fine. But on the iGPU (cuda:1), I hit the following error:
It seems gfx1036 is not supported in the TensileLibrary. As a result, PyTorch cannot run BLAS operations on the iGPU, and the process aborts.
Describe the solution you'd like
I'm looking for a way to use gfx1036 (AMD iGPU) in PyTorch without getting errors. i think this is an extension of a previous issue: https://github.com/ROCm/rocBLAS/issues/1346. I'm wondering whether rocmBLAS is currently working on enabling simultaneous use of iGPU and dGPU, or if such development is being considered for the future. Ideally, I want to be able to use both the iGPU and dGPU simultaneously in PyTorch for distributed or split workloads.
Hi @Piorosen,
rocBLAS doesn't officially support Gfx1036. However, you can try setting the environment variable HSA_OVERRIDE_GFX_VERSION_X=10.3.0 # replace X with GPU #, in your case it would be 2 This should work because the gfx1036 instruction set (ISA) is a superset of gfx1030. Let me know if you still get the same rocBLAS error.
The list of supported GPUs for ROCm 6.4.1 is at: https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.4.1/reference/system-requirements.html