DKMS module for nvidia driver failed to build after upgrade to 43360.
I upgrade clearlinux from 43300 to 43360. NVIDIA driver 570.144 failed to build its kernel module and generate a 8.3MB fail log.
$ head -n 120 /var/log/nvidia-installer.log
nvidia-installer log file '/var/log/nvidia-installer.log'
creation time: Tue May 6 08:40:14 2025
installer version: 570.144
PATH: /usr/local/bin:/usr/bin/haswell/avx512_1:/usr/bin/haswell:/usr/bin:/opt/3rd-party/bin:/usr/share/bcc/tools:/usr/share/bcc/tools/old:/opt/cuda-12.8.1_570.124.06/bin:/opt/cuda/bin:/opt/nvidia/bin:/usr/share/bcc/tools:/usr/share/bcc/tools/old:/opt/cuda-12.8.1_570.124.06/bin:/opt/cuda/bin:/opt/nvidia/bin
nvidia-installer command line:
./nvidia-installer
--kernel-name=6.6.89-1486.ltsprev
--no-precompiled-interface
--no-nvidia-modprobe
--no-distro-scripts
--no-rebuild-initramfs
--skip-module-load
--no-nouveau-check
--no-disable-nouveau
--no-x-check
--dkms
--silent
--allow-installation-with-running-driver
--kernel-module-type=open
--compat32-prefix=/opt/nvidia
--compat32-libdir=lib32
--x-prefix=/opt/nvidia
--x-module-path=/opt/nvidia/lib64/xorg/modules
--x-library-path=/opt/nvidia/lib64
--x-sysconfig-path=/etc/X11/xorg.conf.d
--opengl-prefix=/opt/nvidia
--opengl-libdir=lib64
--wine-prefix=/opt/nvidia
--utility-prefix=/opt/nvidia
--utility-libdir=lib64
--xdg-data-dir=/opt/nvidia/share
--documentation-prefix=/opt/nvidia
--application-profile-path=/etc/nvidia/nvidia-application-profiles-rc.d
--module-signing-key-path=/opt/nvidia/share
--force-libglx-indirect
--glvnd-egl-config-path=/etc/glvnd/egl_vendor.d
--egl-external-platform-config-path=/etc/egl/egl_external_platform.d
--systemd-unit-prefix=/usr/local/lib/systemd/system
--systemd-sleep-prefix=/usr/local/lib/systemd/system-sleep
Using built-in stream user interface
-> Detected 64 CPUs online; setting concurrency level to 32.
-> Scanning the initramfs with lsinitrd...
-> /usr/bin/lsinitrd requires a file path argument, but none was given.
-> /usr/bin/lsinitrd requires a file path argument, but none was given.
-> Initramfs scan failed.
WARNING: Unable to determine the default library path. The path /opt/nvidia/lib will be used, but this path was not detected in the ldconfig(8) cache, and no directory exists at this path, so it is likely that libraries installed there will not be found by the loader.
WARNING: Unable to determine the default X library path. The path /opt/nvidia/lib will be used, but this path was not detected in the ldconfig(8) cache, and no directory exists at this path, so it is likely that libraries installed there will not be found by the loader.
WARNING: An NVIDIA kernel module 'nvidia-modeset' appears to be already loaded in your kernel. This may be because it is in use (for example, by an X server, a CUDA program, or the NVIDIA Persistence Daemon), but this may also happen if your kernel was configured without support for module unloading. Some of the sanity checks that nvidia-installer performs to detect potential installation problems are not possible while an NVIDIA kernel module is running.
-> Would you like to continue installation and skip the sanity checks? If not, please abort the installation, then close any programs which may be using the NVIDIA GPU(s), and attempt installation again. (Answer: Continue installation)
WARNING: Continuing installation despite the presence of a loaded NVIDIA kernel module. Some sanity checks will not be performed. It is strongly recommended that you reboot your computer after installation is complete. If the installation is not successful after rebooting the computer, you can run `nvidia-uninstall` to attempt to remove the NVIDIA driver.
-> Kernel module load tests will be skipped.
-> Installing NVIDIA driver version 570.144.
-> Not probing for precompiled kernel interfaces.
-> Performing CC sanity check with CC="/usr/bin/cc".
-> Performing CC check.
-> Not probing for precompiled kernel interfaces.
-> Kernel source path: '/lib/modules/6.6.89-1486.ltsprev/build'
-> Kernel output path: '/lib/modules/6.6.89-1486.ltsprev/build'
-> Performing Compiler check.
-> Performing Dom0 check.
-> Performing Xen check.
-> Performing PREEMPT_RT check.
-> Performing vgpu_kvm check.
-> Cleaning kernel module build directory.
executing: 'cd kernel-open; /usr/bin/make -k -j32 NV_EXCLUDE_KERNEL_MODULES="" SYSSRC="/lib/modules/6.6.89-1486.ltsprev/build" SYSOUT="/lib/modules/6.6.89-1486.ltsprev/build" clean'...
rm -f -r conftest
make[1]: Entering directory '/usr/lib/modules/6.6.89-1486.ltsprev/build'
make[1]: Leaving directory '/usr/lib/modules/6.6.89-1486.ltsprev/build'
-> Failed to estimate output lines: /bin/sh: line 1: /lib/modules/6.6.87-1484.ltsprev/build/.config: No such file or directory
conftests:300 objects:198 modules:5
-> Building kernel modules
executing: 'cd kernel-open; /usr/bin/make -k -j32 NV_EXCLUDE_KERNEL_MODULES="" SYSSRC="/lib/modules/6.6.89-1486.ltsprev/build" SYSOUT="/lib/modules/6.6.89-1486.ltsprev/build" '...
make[1]: Entering directory '/usr/lib/modules/6.6.89-1486.ltsprev/build'
warning: the compiler differs from the one used to build the kernel
The kernel was built by: gcc (Clear Linux OS for Intel Architecture) 14.2.1 20250410 releases/gcc-14.2.0-1067-g779e002a1d
You are using: gcc (Clear Linux OS for Intel Architecture) 15.1.1 20250429 releases/gcc-15.1.0-15-g68a75e3c0d
Warning: Compiler version check failed:
The major and minor number of the compiler used to
compile the kernel:
gcc (Clear Linux OS for Intel Architecture) 14.2.1 20250410 releases/gcc-14.2.0-1067-g779e002a1d, GNU ld (GNU Binutils) 2.44.0
does not match the compiler used here:
gcc (Clear Linux OS for Intel Architecture) 15.1.1 20250429 releases/gcc-15.1.0-15-g68a75e3c0d
Copyright (C) 2025 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
It is recommended to set the CC environment variable
to the compiler that was used to compile the kernel.
To skip the test and silence this warning message, set
the IGNORE_CC_MISMATCH environment variable to "1".
However, mixing compiler versions between the kernel
and kernel modules can result in subtle bugs that are
difficult to diagnose.
*** Failed CC version check. ***
SYMLINK /tmp/selfgz89752/NVIDIA-Linux-x86_64-570.144/kernel-open/nvidia/nv-kernel.o
SYMLINK /tmp/selfgz89752/NVIDIA-Linux-x86_64-570.144/kernel-open/nvidia-modeset/nv-modeset-kernel.o
CONFTEST: hash__remap_4k_pfn
CONFTEST: set_pages_uc
CONFTEST: list_is_first
CONFTEST: set_memory_uc
CONFTEST: set_memory_array_uc
CONFTEST: set_pages_array_uc
CONFTEST: ioremap_cache
CONFTEST: ioremap_wc
CONFTEST: ioremap_driver_hardened
CONFTEST: ioremap_driver_hardened_wc
CONFTEST: ioremap_cache_shared
CONFTEST: pci_get_domain_bus_and_slot
Here is the tail:
$ tail -n 20 /var/log/nvidia-installer.log
/tmp/selfgz89752/NVIDIA-Linux-x86_64-570.144/kernel-open/common/inc/nv-linux.h: In function 'nv_phys_to_dma':
/tmp/selfgz89752/NVIDIA-Linux-x86_64-570.144/kernel-open/common/inc/nv-linux.h:711:12: error: implicit declaration of function 'phys_to_dma'; did you mean 'nv_phys_to_dma'? [-Wimplicit-function-declaration]
711 | return phys_to_dma(dev, pa);
| ^~~~~~~~~~~
| nv_phys_to_dma
/tmp/selfgz89752/NVIDIA-Linux-x86_64-570.144/kernel-open/common/inc/nv-linux.h: In function 'nv_is_dma_direct':
/tmp/selfgz89752/NVIDIA-Linux-x86_64-570.144/kernel-open/common/inc/nv-linux.h:1217:9: error: implicit declaration of function 'dma_is_direct'; did you mean 'd_is_dir'? [-Wimplicit-function-declaration]
1217 | if (dma_is_direct(get_dma_ops(dev)))
| ^~~~~~~~~~~~~
| d_is_dir
make[3]: *** [scripts/Makefile.build:243: /tmp/selfgz89752/NVIDIA-Linux-x86_64-570.144/kernel-open/nvidia/i2c_nvswitch.o] Error 1
make[3]: Target '/tmp/selfgz89752/NVIDIA-Linux-x86_64-570.144/kernel-open/' not remade because of errors.
make[2]: *** [/usr/lib/modules/6.6.89-1486.ltsprev/build/Makefile:1924: /tmp/selfgz89752/NVIDIA-Linux-x86_64-570.144/kernel-open] Error 2
make[2]: Target 'modules' not remade because of errors.
make[1]: *** [Makefile:234: __sub-make] Error 2
make[1]: Target 'modules' not remade because of errors.
make[1]: Leaving directory '/usr/lib/modules/6.6.89-1486.ltsprev/build'
make: *** [Makefile:115: modules] Error 2
ERROR: The nvidia kernel module was not created.
ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.
We can't really do anything about an external vendor's driver being incompatible with the kernel version we ship. You may have to try a different kernel bundle from us, or may need to get a different version of the vendor's driver package that is compatible with the kernel version you've installed.
For reference, today we ship the following kernel versions:
| bundle | version |
|---|---|
| kernel-native | 6.14.5 |
| kernel-ltscurrent | 6.12.27 |
| kernel-ltsprev | 6.6.89 |
I couldn't find any information about compatible kernel versions for that driver, but if I had to guess, I'd use our kernel-ltscurrent bundle.
We follow the kernel classifications on https://www.kernel.org/ -- as you can see, our kernel-native tracks the latest "stable", kernel-ltscurrent tracks the latest "longterm", and kernel-ltsprev tracks the second-latest "longterm".
part of the issue is that the nvidia build scripts check if the kernel and is built with the current gcc -- which kind of goes bang on any minor gcc update that is until there is a kernel rebuild which is usually within a day
On Thu, May 8, 2025 at 11:31 AM Brett T. Warden @.***> wrote:
bwarden left a comment (clearlinux/distribution#3304) https://github.com/clearlinux/distribution/issues/3304#issuecomment-2863924174
We can't really do anything about an external vendor's driver being incompatible with the kernel version we ship. You may have to try a different kernel bundle from us, or may need to get a different version of the vendor's driver package that is compatible with the kernel version you've installed.
For reference, today we ship the following kernel versions: bundle version kernel-native 6.14.5 kernel-ltscurrent 6.12.27 kernel-ltsprev 6.6.89
— Reply to this email directly, view it on GitHub https://github.com/clearlinux/distribution/issues/3304#issuecomment-2863924174, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ54FOC7OBL7B3FZFTNLOT25OPHRAVCNFSM6AAAAAB4QULPJSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQNRTHEZDIMJXGQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
@bwarden I went back all the way to ltsprev but I still get the compilation errors because of the mismatch between shipped kernel and GCC available:
warning: the compiler differs from the one used to build the kernel
The kernel was built by: gcc (Clear Linux OS for Intel Architecture) 14.2.1 20250410 releases/gcc-14.2.0-1067-g779e002a1d
You are using: gcc (Clear Linux OS for Intel Architecture) 15.1.1 20250429 releases/gcc-15.1.0-15-g68a75e3c0d
Warning: Compiler version check failed:
The major and minor number of the compiler used to
compile the kernel:
gcc (Clear Linux OS for Intel Architecture) 14.2.1 20250410 releases/gcc-14.2.0-1067-g779e002a1d, GNU ld (GNU Binutils) 2.44.0
does not match the compiler used here:
gcc (Clear Linux OS for Intel Architecture) 15.1.1 20250429 releases/gcc-15.1.0-15-g68a75e3c0d
I can't find a c-extras-gcc14 bundle I can install. Any other options?
The linux-ltsprev rebuild with gcc 15.1 apparently didn't work; I'll try to fix that. I would go with the linux-ltscurrent kernel though, but it'll probably be another day or two before the build with gcc 15.1 is released.
I downgrade to 43320, which is the last published version with gcc14, and successfully build the kernel module. Then I upgrade to latest version. The DKMS do not trigger a rebuild and the kernel module is still functional.
In release 43490, linux-ltsprev and linux-ltscurrent are now built successfully with gcc 15.1, which should resolve the gcc version conflict. I'm not sure whether you'll still have compatibility errors between the NVIDIA driver source and our kernel, though.
Thank you! I got the Nvidia driver version 570.153.02 to compile with ltscurrent (org.clearlinux.ltscurrent.6.12.29-1493) on release 43500.