ROCK-Kernel-Driver
ROCK-Kernel-Driver copied to clipboard
Do maintain a compatibility list
From ROC thunk README: "We recommend reading the full compatibility [...] details which are available in the ROCk github:" (this one). This currently reads as a bad joke as there is none of this, and those problems are aplenty.
How are you not maintaining a compatibility list between the ROCm stack, the amdgpukernel module and the linux kernel versions? That would be a grid that would say (numbers are obviously wrong since I am asking for them):
Compatibility matrix - supported versions
| ROCm tag | First amdgpu | Last amdgpu | First kernel | Last kernel |
|---|---|---|---|---|
| 4.5.2 | 5.11.4 | 5.11.32 | 4.13 | 5.10 |
| 4.5.0 | 5.9.5 | 5.11.21 | 4.13 | 5.08 |
You could even add the GPU models to the grid.
For that you would need a version number, but you have one as Rui Teng regularly makes commit bumps...
commit 5e2158164e449dcf2d956bea8b732f5b5290355d
Author: Rui Teng <[email protected]>
Date: Fri Sep 17 13:39:36 2021 +0800
Bump AMDGPU version to 5.11.32.21.40
Signed-off-by: Rui Teng <[email protected]>
Change-Id: I8bf3e783824e74523f5f0a284fcd942b79e317d4
... and you would need that version number to be stored in the kernel module. You have the parameter macros for that for example.
I have a 5.4.0-91 Ubuntu 20.04 kernel, and I can fdopen /dev/kfd fine. I have a 5.10-ish Debian 11 kernel, and I can't fdopen /dev/kfd at all.
Before blindly recompiling my kernel module, I should be able to see where I'm at.
P.S. Ok, for me it was a matter of being in the render group, but still! :sweat_smile:
The compatibility lists are all kept in the main ROCm documentation. We've got documentation people who keep tabs on that. I should update the README here and in ROCT to make things clearer. For example, we have https://docs.amd.com/bundle/ROCm-Release-Notes-v5.4.3/page/About_This_Document.html
And yes, the whole video/render thing is a pain. At least you got that fixed up.
The big issue is that the amdgpu-pro driver (which we share the kernel code with) has their own versioning. For example, the 5.11.32.21.40 version above means: 5.11 - code base 32 - promotions from staging-to-mainline on this code base 21.40 - branched for amdgpu-pro 21.40 release
So we won't have a nice alignment there, but the documentation linked above at least gives the whole OS+base kernel+ROCm version compatibility. Just not the amdgpu version. Which I agree we should have added to this for reference. Hopefully this helps. And I'll try to clean up the README links.