ROCm 5.xx ever planning to include gfx90c GPUs?
Hi the official docker images of pytorch and tf docker are avialble only for gfx900(Vega10-type GPU - MI25, Vega56, Vega64), gfx906 (Vega20-type GPU - MI50, MI60) and gfx908 (MI100), gfx90a (MI200) and gfx1030 (Navi21).
When does gfx90c support expected. Thanks
Hi, @shridharkini6!
Thanks for your request. Since I am not an employee at AMD, I have no insight into what is planned there internally. However, at least some amount of library coverage seems to be a prerequisite for extending the Docker images to this class of GPUs, which are integrated into the CPU (or an "APU" in AMD's lingo). However, I do not see any support for the gfx90c as a TARGET in any of the public libraries. See my pull request for an attempt at a complete overview of the state of library support. PyTorch uses RCCL and MIOpen to run on ROCm, and so does TensorFlow. MIOpen in turn uses rocBLAS as its backend. For the available TARGETs, see the CMakeLists.txt of rocBLAS and the CMakeLists.txt of RCCL, respectively. As you can see, there is no support for gfx90c and in fact no other APU.
This aligns with what can be gathered from public sources, namely that AMD is focussing on the products which the hyperscalers or supercomputer customers are currently buying. I personally think this is fair enough, as those customers seem to be rather feature-sensitive. Starting from those high-profile customers, consider the following leaky pipe of support:
- Enterprise
("Instinct"-branded products intended for hyperscalers and supercomputer customers, usually sold in servers or racks) - Professional
("Radeon PRO"-branded products intended for CAD and such use cases, usually sold in workstations) - Desktop
("Radeon"-branded products intended for demanding users like gamers and video editors, sold as dGPU components or pre-built systems) - APUs
("Ryzen with Radeon Graphics"-branded products intended for lighter workloads like office PCs and thin/light laptops)
Things might change a bit with the Ryzen 7000 line of desktop processors, which are announced to include a chiplet-ish GPU in the IO die. Such an arrangement does not currently fit into this leaky support pipe, but I would also not hold my breath for any kind of revolution. My bet would be on support gradually improving, as it has (not without setbacks) in the past.
I do not think it is AMD's top priority to support an APU when even the Navi 22 and Navi 23 are not supported. Also, AMD did pull the plug on supporting APUs long time before. So I think quite frankly, to answer your question, it is... never.
@ffleader1 that's not so clever move from AMD, because they have nothing positioned against Nvidia Jetson type of hardware. So we buy Nvidia APUs despite they're not very FOSS friendly.
Here is a workaround to run pytorch on gfx90c. Just build pytorch for gfx900 and override gfx90c to gfx900.
Build pytorch
$ git clone https://github.com/pytorch/pytorch.git
$ cd pytorch
$ git submodule update --init --recursive
$ sudo pip3 install -r requirements.txt
$ sudo pip3 install enum34 numpy pyyaml setuptools typing cffi future hypothesis typing_extensions
$ sudo python3 tools/amd_build/build_amd.py
$ sudo PYTORCH_ROCM_ARCH=gfx900 USE_ROCM=1 MAX_JOBS=4 python3 setup.py install
Run an example
$ git clone https://github.com/pytorch/examples.git
$ cd examples/mnist
$ sudo pip3 install -r requirements.txt
$ sudo HSA_OVERRIDE_GFX_VERSION=9.0.0 python3 main.py
...
Train Epoch: 14 [51200/60000 (85%)] Loss: 0.027863
Train Epoch: 14 [51840/60000 (86%)] Loss: 0.017484
Train Epoch: 14 [52480/60000 (87%)] Loss: 0.021983
Train Epoch: 14 [53120/60000 (88%)] Loss: 0.003217
Train Epoch: 14 [53760/60000 (90%)] Loss: 0.011038
Train Epoch: 14 [54400/60000 (91%)] Loss: 0.007962
Train Epoch: 14 [55040/60000 (92%)] Loss: 0.018526
Train Epoch: 14 [55680/60000 (93%)] Loss: 0.001039
Train Epoch: 14 [56320/60000 (94%)] Loss: 0.017513
Train Epoch: 14 [56960/60000 (95%)] Loss: 0.028949
Train Epoch: 14 [57600/60000 (96%)] Loss: 0.028286
Train Epoch: 14 [58240/60000 (97%)] Loss: 0.064388
Train Epoch: 14 [58880/60000 (98%)] Loss: 0.002042
Train Epoch: 14 [59520/60000 (99%)] Loss: 0.002829
Test set: Average loss: 0.0280, Accuracy: 9921/10000 (99%)
Note: 1, Disable some power features for gfx90c sudo modprobe amdgpu ppfeaturemask=0xfff73fff 2, ROCm https://docs.amd.com/bundle/ROCm-Downloads-Guide-v5.0/page/ROCm_Installation.html 3, Pytorch branch: master commit: 815532d40c25e81d8c09b3c36403016bea394aee
You can also use docker of pytorch on gfx90c. Just run like this. @shridharkini6
$ git clone https://github.com/pytorch/examples.git
$ cd examples/mnist
$ pip3 install -r requirements.txt
$ HSA_OVERRIDE_GFX_VERSION=9.0.0 python3 main.py
Note: Your video memory should be at least 2GB.
You can also use docker of pytorch on gfx90c. Just run like this. @shridharkini6
$ git clone https://github.com/pytorch/examples.git $ cd examples/mnist $ pip3 install -r requirements.txt $ HSA_OVERRIDE_GFX_VERSION=9.0.0 python3 main.pyNote: Your video memory should be at least 2GB.
Maybe you have not tried it but at least do u think your to method will work with unsupported GPUs, like gfx1031 for example.
You can also use docker of pytorch on gfx90c. Just run like this. @shridharkini6
$ git clone https://github.com/pytorch/examples.git $ cd examples/mnist $ pip3 install -r requirements.txt $ HSA_OVERRIDE_GFX_VERSION=9.0.0 python3 main.pyNote: Your video memory should be at least 2GB.
Maybe you have not tried it but at least do u think your to method will work with unsupported GPUs, like gfx1031 for example.
You may try, run like this.
$ HSA_OVERRIDE_GFX_VERSION=10.3.0 python3 main.py
You can also use docker of pytorch on gfx90c. Just run like this. @shridharkini6
$ git clone https://github.com/pytorch/examples.git $ cd examples/mnist $ pip3 install -r requirements.txt $ HSA_OVERRIDE_GFX_VERSION=9.0.0 python3 main.pyNote: Your video memory should be at least 2GB.
Maybe you have not tried it but at least do u think your to method will work with unsupported GPUs, like gfx1031 for example.
You may try, run like this.
$ HSA_OVERRIDE_GFX_VERSION=10.3.0 python3 main.py
wait I am a bit confused. Maybe I am missing something but your example is about running pytorch example right But how do u get Rocm to install on gfx90c or gfx1031in the first place? Thank you,
You can also use docker of pytorch on gfx90c. Just run like this. @shridharkini6
$ git clone https://github.com/pytorch/examples.git $ cd examples/mnist $ pip3 install -r requirements.txt $ HSA_OVERRIDE_GFX_VERSION=9.0.0 python3 main.pyNote: Your video memory should be at least 2GB.
Maybe you have not tried it but at least do u think your to method will work with unsupported GPUs, like gfx1031 for example.
You may try, run like this. $ HSA_OVERRIDE_GFX_VERSION=10.3.0 python3 main.py
wait I am a bit confused. Maybe I am missing something but your example is about running pytorch example right But how do u get Rocm to install on gfx90c or gfx1031in the first place? Thank you,
1, Docker with PyTorch and ROCm installed https://docs.amd.com/bundle/AMD-Deep-Learning-Guide-v5.1.3/page/Deep_Learning_Frameworks.html 2, ROCm Installation guide https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.0/page/Overview_of_ROCm_Installation_Methods.html
You can also use docker of pytorch on gfx90c. Just run like this. @shridharkini6
$ git clone https://github.com/pytorch/examples.git $ cd examples/mnist $ pip3 install -r requirements.txt $ HSA_OVERRIDE_GFX_VERSION=9.0.0 python3 main.pyNote: Your video memory should be at least 2GB.
Maybe you have not tried it but at least do u think your to method will work with unsupported GPUs, like gfx1031 for example.
You may try, run like this. $ HSA_OVERRIDE_GFX_VERSION=10.3.0 python3 main.py
wait I am a bit confused. Maybe I am missing something but your example is about running pytorch example right But how do u get Rocm to install on gfx90c or gfx1031in the first place? Thank you,
1, Docker with PyTorch and ROCm installed https://docs.amd.com/bundle/AMD-Deep-Learning-Guide-v5.1.3/page/Deep_Learning_Frameworks.html 2, ROCm Installation guide https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.0/page/Overview_of_ROCm_Installation_Methods.html
I have not tried docker but for rocm, I am pretty sure the install will only be successful if your GPU is supported. I.e the rocm installation will not work on a gfx1031 or lower.
@xfyucg i followed your methods, looks to me training is using only CPU not GPU.
import torch
t = torch.tensor([5, 5, 5], dtype=torch.int64, device='cuda')
throws error like
Traceback (most recent call last): File "", line 1, in File "/opt/conda/lib/python3.7/site-packages/torch/cuda/init.py", line 216, in _lazy_init torch._C._cuda_init() RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
Thanks
@xfyucg i followed your methods, looks to me training is using only CPU not GPU.
import torch t = torch.tensor([5, 5, 5], dtype=torch.int64, device='cuda')throws error like
Traceback (most recent call last): File "", line 1, in File "/opt/conda/lib/python3.7/site-packages/torch/cuda/init.py", line 216, in _lazy_init torch._C._cuda_init() RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
Thanks
Run like this, that works well on my Cezanne platform.
lang@lang-test:~/Videos/pytorch$ HSA_OVERRIDE_GFX_VERSION=9.0.0 python3
Python 3.8.10 (default, Mar 15 2022, 12:22:08)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> t = torch.tensor([5, 5, 5], dtype=torch.int64, device='cuda')
>>>
@xfyucg i followed your methods, looks to me training is using only CPU not GPU.
import torch t = torch.tensor([5, 5, 5], dtype=torch.int64, device='cuda')throws error like
Traceback (most recent call last): File "", line 1, in File "/opt/conda/lib/python3.7/site-packages/torch/cuda/init.py", line 216, in _lazy_init torch._C._cuda_init() RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
Thanks
Run like this, that works well on my Cezanne platform.
lang@lang-test:~/Videos/pytorch$ HSA_OVERRIDE_GFX_VERSION=9.0.0 python3 Python 3.8.10 (default, Mar 15 2022, 12:22:08) [GCC 9.4.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import torch >>> t = torch.tensor([5, 5, 5], dtype=torch.int64, device='cuda') >>>
Tried this as well..ended up with same error
@shridharkini6
@xfyucg i followed your methods, looks to me training is using only CPU not GPU.
import torch t = torch.tensor([5, 5, 5], dtype=torch.int64, device='cuda')throws error like
Traceback (most recent call last): File "", line 1, in File "/opt/conda/lib/python3.7/site-packages/torch/cuda/init.py", line 216, in _lazy_init torch._C._cuda_init() RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
Thanks
Run like this, that works well on my Cezanne platform.
lang@lang-test:~/Videos/pytorch$ HSA_OVERRIDE_GFX_VERSION=9.0.0 python3 Python 3.8.10 (default, Mar 15 2022, 12:22:08) [GCC 9.4.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import torch >>> t = torch.tensor([5, 5, 5], dtype=torch.int64, device='cuda') >>>Tried this as well..ended up with same error
Can you put the output of $ rocminfo here?
@shridharkini6
@xfyucg i followed your methods, looks to me training is using only CPU not GPU.
import torch t = torch.tensor([5, 5, 5], dtype=torch.int64, device='cuda')throws error like
Traceback (most recent call last): File "", line 1, in File "/opt/conda/lib/python3.7/site-packages/torch/cuda/init.py", line 216, in _lazy_init torch._C._cuda_init() RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
Thanks
Run like this, that works well on my Cezanne platform.
lang@lang-test:~/Videos/pytorch$ HSA_OVERRIDE_GFX_VERSION=9.0.0 python3 Python 3.8.10 (default, Mar 15 2022, 12:22:08) [GCC 9.4.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import torch >>> t = torch.tensor([5, 5, 5], dtype=torch.int64, device='cuda') >>>Tried this as well..ended up with same error
Can you put the output of $ rocminfo here?
ROCk module is loaded
HSA System Attributes
Runtime Version: 1.1
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
==========
HSA Agents
Agent 1
Name: AMD Ryzen 7 4700U with Radeon Graphics
Uuid: CPU-XX
Marketing Name: AMD Ryzen 7 4700U with Radeon Graphics
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2000
BDFID: 0
Internal Node ID: 0
Compute Unit: 8
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 7612028(0x74267c) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 7612028(0x74267c) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 7612028(0x74267c) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
Agent 2
Name: gfx90c
Uuid: GPU-XX
Marketing Name:
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 1024(0x400) KB
Chip ID: 5686(0x1636)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 1600
BDFID: 1024
Internal Node ID: 1
Compute Unit: 7
SIMDs per CU: 4
Shader Engines: 1
Shader Arrs. per Eng.: 1
WatchPts on Addr. Ranges:4
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 64(0x40)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 40(0x28)
Max Work-item Per CU: 2560(0xa00)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 524288(0x80000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx90c:xnack-
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***
@shridharkini6 Are you using docker? If yes, try to start your docker like this.
sudo docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G rocm/pytorch:latest
@shridharkini6 Are you using docker? If yes, try to start your docker like this.
sudo docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G rocm/pytorch:latest
I have tried the same..used rocm/pytorch:latest-base docker.
@shridharkini6 Are you using docker? If yes, try to start your docker like this.
sudo docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G rocm/pytorch:latestI have tried the same..used rocm/pytorch:latest-base docker.
According to https://docs.amd.com/bundle/AMD-Deep-Learning-Guide-v5.1.3/page/Deep_Learning_Frameworks.html Option 3: Install PyTorch Using PyTorch ROCm Base Docker Image
docker pull rocm/pytorch:latest-base
NOTE This will download the base container, which does not contain PyTorch
So please use rocm/pytorch:latest
docker pull rocm/pytorch:latest
docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G rocm/pytorch:latest
sudo modprobe amdgpu ppfeaturemask=0xfff73fff
HSA_OVERRIDE_GFX_VERSION=9.0.0 python3
@shridharkini6 Are you using docker? If yes, try to start your docker like this.
sudo docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G rocm/pytorch:latestI have tried the same..used rocm/pytorch:latest-base docker.
According to https://docs.amd.com/bundle/AMD-Deep-Learning-Guide-v5.1.3/page/Deep_Learning_Frameworks.html Option 3: Install PyTorch Using PyTorch ROCm Base Docker Image
docker pull rocm/pytorch:latest-base
NOTE This will download the base container, which does not contain PyTorch
So please use rocm/pytorch:latest
docker pull rocm/pytorch:latest docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G rocm/pytorch:latest sudo modprobe amdgpu ppfeaturemask=0xfff73fff HSA_OVERRIDE_GFX_VERSION=9.0.0 python3
His hardware is not supported, and so is your I think. APUs in general do not work. Docker won't change unsatisfied prerequisites hardware availability.
@shridharkini6 Are you using docker? If yes, try to start your docker like this.
sudo docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G rocm/pytorch:latestI have tried the same..used rocm/pytorch:latest-base docker.
According to https://docs.amd.com/bundle/AMD-Deep-Learning-Guide-v5.1.3/page/Deep_Learning_Frameworks.html Option 3: Install PyTorch Using PyTorch ROCm Base Docker Image docker pull rocm/pytorch:latest-base NOTE This will download the base container, which does not contain PyTorch So please use rocm/pytorch:latest
docker pull rocm/pytorch:latest docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G rocm/pytorch:latest sudo modprobe amdgpu ppfeaturemask=0xfff73fff HSA_OVERRIDE_GFX_VERSION=9.0.0 python3His hardware is not supported, and so is your I think. APUs in general do not work. Docker won't change unsatisfied prerequisites hardware availability.
No, they use same ISA with gfx900. So for gfx90c, just override it to gfx900. That actually works. He uses rocm/pytorch:latest-base, so he must build pytorch for rocm.
@xfyucg i have followed all the procedures you suggested. i.e used rocm/pytorch:latest-base and compiled pytorch from source. but get the same error
Traceback (most recent call last):
File "", line 1, in
File "/opt/conda/lib/python3.7/site-packages/torch/cuda/init.py", line 216, in _lazy_init
torch._C._cuda_init()
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
@xfyucg i have followed all the procedures you suggested. i.e used rocm/pytorch:latest-base and compiled pytorch from source. but get the same error
Traceback (most recent call last): File "", line 1, in File "/opt/conda/lib/python3.7/site-packages/torch/cuda/init.py", line 216, in _lazy_init torch._C._cuda_init() RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
May be some environment issues, it's hard to debug. It's error-prone to build pytorch by yourself. Why not use rocm/pytorch:latest? It's simple and also the recommended way.
@xfyucg i have followed all the procedures you suggested. i.e used rocm/pytorch:latest-base and compiled pytorch from source. but get the same error
Traceback (most recent call last): File "", line 1, in File "/opt/conda/lib/python3.7/site-packages/torch/cuda/init.py", line 216, in _lazy_init torch._C._cuda_init() RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspxMay be some environment issues, it's hard to debug. It's error-prone to build pytorch by yourself. Why not use rocm/pytorch:latest? It's simple and also the recommended way.
@xfyucg yes i tried with rocm/pytorch:latest also. it throws similar errors. i hope it could be issues with base libraries as @Bengt mentioned.
@xfyucg i have followed all the procedures you suggested. i.e used rocm/pytorch:latest-base and compiled pytorch from source. but get the same error
Traceback (most recent call last): File "", line 1, in File "/opt/conda/lib/python3.7/site-packages/torch/cuda/init.py", line 216, in _lazy_init torch._C._cuda_init() RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspxMay be some environment issues, it's hard to debug. It's error-prone to build pytorch by yourself. Why not use rocm/pytorch:latest? It's simple and also the recommended way.
@xfyucg yes i tried with rocm/pytorch:latest also. it throws similar errors. i hope it could be issues with base libraries as @Bengt mentioned.
No. If you install and start docker(rocm/pytorch:latest) correctly, you will get the error like following.
root@0f962c3a9d38:/var/lib/jenkins# python3
Python 3.7.13 (default, Mar 29 2022, 02:18:16)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
"hipErrorNoBinaryForGpu: Unable to find code object for all current devices!"
Aborted (core dumped)
root@0f962c3a9d38:~#
After override gfx90c to gfx900
root@0f962c3a9d38:/var/lib/jenkins# HSA_OVERRIDE_GFX_VERSION=9.0.0 python3
Python 3.7.13 (default, Mar 29 2022, 02:18:16)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
>>>
Make sure amdgpu kernel mode driver is installed. If you use a generic kernel on Ubuntu 20.04, install amdgpu kernel mode driver as following.
sudo apt-get update
wget https://repo.radeon.com/amdgpu-install/22.10.3/ubuntu/focal/amdgpu-install_22.10.3.50103-1_all.deb
sudo apt-get install ./amdgpu-install_22.10.3.50103-1_all.deb
amdgpu-install --usecase=dkms
Try updating your system's kernel to a version newer than 6.0 and run the commands setting the following environment variable:
HSA_OVERRIDE_GFX_VERSION=9.0.0
You can use export HSA_OVERRIDE_GFX_VERSION=9.0.0 in the shell you are running the commands to propagate the environment variable to child processes. That's what allowed the rocm/pytorch container to not crash on import or crash when doing simple tensor operations like torch.tensor([[1,2],[3,4]]).to(torch.device('cuda')).
I tested this on NixOS, branch 22.11, kernel 6.0.13 and latest rocm/pytorch container with a Ryzen 5600G.
CC @hongxiayang
@shridharkini6 Hi, is your issue resolved on the latest ROCm? If so can we close this ticket?
Is this still applicable to latest ROCm?
@shridharkini6 Unfortunately your APU (gfx90c) is not currently supported in the latest ROCm. Thanks!