Amund

Results 20 comments of Amund

Ah... there is docker in docker error in fact. I'll try to manage it.

Yep, here it is : ```bash root@fb554eea83ff:/app# rocminfo ROCk module is loaded ===================== HSA System Attributes ===================== Runtime Version: 1.18 Runtime Ext Version: 1.11 System Timestamp Freq.: 1000.000000MHz Sig. Max...

@naromero77amd Sure, I am using a Docker image based on rocm/pytorch:latest (seems to be rocm/pytorch:rocm7.0.2_ubuntu24.04_py3.12_pytorch_release_2.8.0)

@taylding-amd @naromero77amd I created a repo with a streamlined example : https://github.com/Amund/rocm-test-2730

@naromero77amd No build or installation, only what is available in the image. Tested with rocm 7.0.2 and 7.1 : https://github.com/Amund/rocm-test-2730

I've tested it ```bash $ sudo docker run -it --rm \ --cap-add=SYS_PTRACE \ --security-opt seccomp=unconfined \ --device=/dev/kfd \ --device=/dev/dri \ --group-add video \ --ipc=host \ --shm-size 8G \ -v ./src:/app...

@taylding-amd I've checked root groups in the container ```bash root@5e0bcdad7c36:/app# groups root video ``` Do you have any other tests I can do ?

@naromero77amd From *outside* the container: ```bash # rocm-smi --showpids ============================ ROCm System Management Interface ============================ ===================================== KFD Processes ====================================== No KFD PIDs currently running ========================================================================================== ================================== End of ROCm SMI...

@naromero77amd @taylding-amd Note: I updated my Ubuntu PC from 25.04 to 25.10 and removed the rocm7.01 driver from the host to revert to the generic `amdgpu` driver. As a result,...

@naromero77amd As said, I removed the rocm driver from the host, so I confirm `rocm-smi` is **not** installed on the host (`which rocm-smi` doesn't return anything). But I'm starting to...