Max Weiss

Results 4 comments of Max Weiss

It looks like the function `amd::smi::GetProcessInfoForPID` returns too early when the files are missing in /sys/devices/virtual/kfd/kfd/proc/. But I'm not sure if changing this function to just ignore the missing files...

Unfortunately, ROCm 6.1.0 has the same issue. The host has four GPUs, but we mount only one into the container: ``` $ docker run --rm -it -v /opt/work:/opt/work --device=/dev/dri/card1 --device=/dev/dri/renderD128...

ROCm 6.4.1 seems to have issues with MI300A and MI300X GPUs. I've created a similar issue in the ROCm repo: https://github.com/ROCm/ROCm/issues/4759. Older ROCm versions work fine.

I think this is just a display error/bug. Does `rocminfo` show the two GPUs? On our host, the rocm-smi output in the container looks similar to yours, but `rocminfo` and...