ROCK-Kernel-Driver icon indicating copy to clipboard operation
ROCK-Kernel-Driver copied to clipboard

Auto generated dracut configuration will break when there are multiple kernel versions available

Open harish2704 opened this issue 5 years ago • 3 comments

Hi, I am using the packages rock-dkms on Fedora-31 from official repository ( http://repo.radeon.com/rocm/yum/rpm )

The pre-build.sh installed by latest rpm package will look like below. ( it is different from the current version. that is, I couldn't find the quoted lines in this repo. but I don't know where to file this bug report )

image

I the above file, the last two lines are adding a dracut config file which will modify firmware path during initrd building.

This logic will break if there are multiple kernel versions available in one system.

if there are multple kernels, it will create multiple /etc/dracut.conf.d/amdgpu-<KERNEL_VERSION>.conf files and both files will be included by dracut during initrd build as a result, the fw_path will became a non-existent path ( because it is the result of concatenation of two firmware paths )

So, the solution is simple. We only need to create a single file /etc/dracut.conf.d/amdgpu.conf for all the kernels. The content of the file should as follows

add_drivers+=" amdgpu"
fw_dir+="/lib/firmware/$kernel"

"$kernel" will be a variable set to currntly building kernel's version number during the running time of drcut

This way, rock-dkms will not beak if there are multiple kernel versions exists in single system

harish2704 avatar May 20 '20 19:05 harish2704

Hi Harish, thank you for reporting this problem and the suggested solution. The engineer who works on our packaging script agrees with your analysis and implemented the fix you suggested. It will probably not make it into the next release (ROCm 3.5) that's too far along the process already. It should make it into ROCm 3.6.

fxkamd avatar May 21 '20 20:05 fxkamd

Hi @fxkamd,

Thanks for then confirmation & update.

I respect the release cycle of this project, But, I would like to kindly draw your attention toward the end user experience due to this bug

  1. Most of the linux users will not delete old kernels at the time of updates. Usually every one will keep more than one kernel .
  2. When this bug happens, system display will not work. it will stuck at a blank screen & it will be hard to debug for normal users.

( I was able to trouble shoot this issue because, I had spare PC & I ssh'ed into the broken system from spare system )

So, I suggest to release the fix for this issue as early early as possible.

harish2704 avatar May 23 '20 19:05 harish2704

Hi Harish,

The problem you reported has been in the driver for a long time and has never come to our attention before. It not only affects the ROCm driver, but also the Linux Pro graphics driver. If the problem was common we would have expected to see bug reports about it before. We also found that your problem was not straight forward to reproduce. You didn't provide exact steps for reproducing it, but we suspect you used the DKMS command line in ways that is not typical for end users just installing our package using yum.

In our opinion the risk of a late fix outweighs the benefit in this case. If you provide steps to reproduce that are less obscure than what we came up with, maybe you'll change our mind.

Best regards, Felix

fxkamd avatar May 25 '20 18:05 fxkamd

Closing off as resolved, since it's been addressed in the packaging scripts

kentrussell avatar Nov 10 '23 16:11 kentrussell