ROCK-Kernel-Driver icon indicating copy to clipboard operation
ROCK-Kernel-Driver copied to clipboard

Do maintain a compatibility list

Open Maxzor opened this issue 3 years ago • 1 comments
trafficstars

From ROC thunk README: "We recommend reading the full compatibility [...] details which are available in the ROCk github:" (this one). This currently reads as a bad joke as there is none of this, and those problems are aplenty.

How are you not maintaining a compatibility list between the ROCm stack, the amdgpukernel module and the linux kernel versions? That would be a grid that would say (numbers are obviously wrong since I am asking for them):

Compatibility matrix - supported versions

ROCm tag First amdgpu Last amdgpu First kernel Last kernel
4.5.2 5.11.4 5.11.32 4.13 5.10
4.5.0 5.9.5 5.11.21 4.13 5.08

You could even add the GPU models to the grid.

For that you would need a version number, but you have one as Rui Teng regularly makes commit bumps...

commit 5e2158164e449dcf2d956bea8b732f5b5290355d
Author: Rui Teng <[email protected]>
Date:   Fri Sep 17 13:39:36 2021 +0800

    Bump AMDGPU version to 5.11.32.21.40
    
    Signed-off-by: Rui Teng <[email protected]>
    Change-Id: I8bf3e783824e74523f5f0a284fcd942b79e317d4

... and you would need that version number to be stored in the kernel module. You have the parameter macros for that for example.

Maxzor avatar Dec 21 '21 00:12 Maxzor

I have a 5.4.0-91 Ubuntu 20.04 kernel, and I can fdopen /dev/kfd fine. I have a 5.10-ish Debian 11 kernel, and I can't fdopen /dev/kfd at all.

Before blindly recompiling my kernel module, I should be able to see where I'm at.

P.S. Ok, for me it was a matter of being in the render group, but still! :sweat_smile:

Maxzor avatar Dec 21 '21 13:12 Maxzor

The compatibility lists are all kept in the main ROCm documentation. We've got documentation people who keep tabs on that. I should update the README here and in ROCT to make things clearer. For example, we have https://docs.amd.com/bundle/ROCm-Release-Notes-v5.4.3/page/About_This_Document.html

And yes, the whole video/render thing is a pain. At least you got that fixed up.

The big issue is that the amdgpu-pro driver (which we share the kernel code with) has their own versioning. For example, the 5.11.32.21.40 version above means: 5.11 - code base 32 - promotions from staging-to-mainline on this code base 21.40 - branched for amdgpu-pro 21.40 release

So we won't have a nice alignment there, but the documentation linked above at least gives the whole OS+base kernel+ROCm version compatibility. Just not the amdgpu version. Which I agree we should have added to this for reference. Hopefully this helps. And I'll try to clean up the README links.

kentrussell avatar Feb 24 '23 22:02 kentrussell