dkms icon indicating copy to clipboard operation
dkms copied to clipboard

DKMS create .thinlto-cache at compiling dkms modules - but they dont get removed

Open ptr1337 opened this issue 2 years ago • 15 comments

Hello,

When installing a Kernel with Clang Thin LTO dkms creates a folder called .thinlto-cache at /usr/lib/modules/$kernel/build/.thinlto. Actually these folder should be removed, after the compilation of the dkms modules.

When a Kernel gets upgraded or removed, the folder still stays at /usr/lib/modules. My complete /usr/lib/modules contains folder from old kernels.

The .thinlto-cache folder does not get copied at the packaging, I have checked this.

ptr1337 avatar Jan 17 '23 18:01 ptr1337

The folder is created by the kernel build is it not? As such it should probably be removed by the kernel itself when we issue the make clean command.

I don't think dkms should know details like LTO (thin or not) and removing it's artefacts.

From a very quick look this seems like a kernel bug - $(KBUILD_EXTMOD)/.thinlto-cache is used in the clean section while $(extmod_prefix).thinlto-cache is being used instead

evelikov avatar Jan 18 '23 11:01 evelikov

cc @samitolvanen

nickdesaulniers avatar Jan 18 '23 18:01 nickdesaulniers

extmod_prefix is just KBUILD_EXTMOD with a slash:

export extmod_prefix = $(if $(KBUILD_EXTMOD),$(KBUILD_EXTMOD)/)

However, it looks like it might not be defined when the --thinlto-cache-dir flag is set. Perhaps you can test moving that definition before the LTO flags are set?

samitolvanen avatar Jan 18 '23 20:01 samitolvanen

@ptr1337 can you try the above suggestion?

@samitolvanen moving the export(s) further up makes sense to me. Considering you're the owner/maintainer of said code, can you send a patch upstream? Thanks o/

evelikov avatar Feb 07 '23 12:02 evelikov

@ptr1337 can you try the above suggestion?

@samitolvanen moving the export(s) further up makes sense to me. Considering you're the owner/maintainer of said code, can you send a patch upstream? Thanks o/

Can you send me maybe a patch which I should test?

ptr1337 avatar Feb 09 '23 15:02 ptr1337

How about this patchset and then setting the cachedir to /tmp or something like this? I will test this the coming days.

https://patchwork.kernel.org/project/linux-kbuild/patch/[email protected]/#24308209

ptr1337 avatar Mar 11 '23 21:03 ptr1337

Any news here? We provide on CachyOS a extra variant, which is built with ThinLTO. When people use dkms modules, there is over the time a massive buildcache left in /usr/lib/modules, also dkms is throwing a warning, because the directories of the kernel module is still there.

ptr1337 avatar Jan 05 '24 20:01 ptr1337

IIRC the issue was that the kernel Makefiles are not properly setting the respective variables. The linked patch may sidetrack that by storing the files outside of /usr/lib/modules

Doubt I'll be working on this, but here are some general ideas:

  • compare the file listings at least /usr/lib/modules and /var across a) clean system b) dkms add/build/install and c) dkms remove
  • tweak the make invocations - use make V=1 ... or make V=12 ... and check the log

evelikov avatar Jan 09 '24 10:01 evelikov

Lets get this issue fixed asap rocky. It was a blocker for me cause an old dkms module on an old kernel version was not getting deleted. Caused issues installing a newer version of that program.

LethalManBoob avatar May 05 '24 10:05 LethalManBoob

@LethalManBoob I admire your enthusiasm.

If you can reproduce the issue and can provide some feedback to my previous post https://github.com/dell/dkms/issues/292#issuecomment-1882841116 that would be great - @ptr1337 in case you've missed it.

Alternatively, a clear reproducer would be appreciated. One which includes:

  • what's the OS used to trigger this
  • the kernel version and source if not kernel.org
  • version of clang andorigin (distro one, binaries from llvm, other)
  • .config file and kernel build command/recipe

All and all, it sounds more like general kernel debugging/hacking ;-)

evelikov avatar May 07 '24 15:05 evelikov

@LethalManBoob I admire your enthusiasm.

If you can reproduce the issue and can provide some feedback to my previous post #292 (comment) that would be great - @ptr1337 in case you've missed it.

Alternatively, a clear reproducer would be appreciated. One which includes:

  • what's the OS used to trigger this
  • the kernel version and source if not kernel.org
  • version of clang andorigin (distro one, binaries from llvm, other)
  • .config file and kernel build command/recipe

All and all, it sounds more like general kernel debugging/hacking ;-)

@evelikov

This can be reproduced with following: Distro: Archlinux (or archlinux based) Kernel: Any 6.x Kernel, (I dont know if it was already present previously) Clang Version: Archlinux clang (Is reproduceable with 15, 16, 17 and 18) Config File: Just archlinux default config, but kernel compiled with ThinLTO

If you have a kernel installed with ThinLTO, simply install any dkms module (like nvidia-dkms for example), then either remove the kernel or upgrade it --> Leftovers in /usr/lib/modules/$kernelversion+name

ptr1337 avatar May 07 '24 15:05 ptr1337

but kernel compiled with ThinLTO

@ptr1337 can you provide a PKGBUILD that achieves this?

evelikov avatar May 08 '24 14:05 evelikov

@evelikov

https://github.com/CachyOS/linux-cachyos/blob/master/linux-cachyos/PKGBUILD#L128

Change here the value to "thin" and then build the kernel. You can also fetch prebuilt kernels here: https://mirror.cachyos.org/repo/x86_64_v3/cachyos-v3/linux-cachyos-lto-6.8.9-4-x86_64_v3.pkg.tar.zst https://mirror.cachyos.org/repo/x86_64_v3/cachyos-v3/linux-cachyos-lto-headers-6.8.9-4-x86_64_v3.pkg.tar.zst

Be aware, that the prebuilt ones are only supported by x86-64-v3 supported CPUs and you might want to pull in the keyring, before installing the package.

ptr1337 avatar May 08 '24 14:05 ptr1337

This indeed is a kernel bug.

--thinlto-cache-dir was set to $(extmod_prefix).thinlto-cache at line 945 https://github.com/torvalds/linux/blob/45db3ab70092637967967bfd8e6144017638563c/Makefile#L945

But extmod_prefix was assigned at line 1095 https://github.com/torvalds/linux/blob/45db3ab70092637967967bfd8e6144017638563c/Makefile#L1095

Therefore, the .thinlto-cache directory is always created in the working directory.

And it won't be deleted when executing make clean, This is because in the clean: target, the path was written as $(KBUILD_EXTMOD)/.thinlto-cache https://github.com/torvalds/linux/blob/45db3ab70092637967967bfd8e6144017638563c/Makefile#L1785-L1786

By moving the assignment of extmod_prefix before line 945, this bug can be resolved.

I have opened a bug report https://bugzilla.kernel.org/show_bug.cgi?id=218825

xuzhen avatar May 09 '24 15:05 xuzhen

The ThinLTO caching was removed in linux-next https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=aba091547ef6159d52471f42a3ef531b7b660ed8

xuzhen avatar May 12 '24 15:05 xuzhen

The ThinLTO caching was removed in linux-next https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=aba091547ef6159d52471f42a3ef531b7b660ed8

Thanks, issue has been fixed after cherry picking the patch into the kernel. We can close this issue, if its upstream merged.

ptr1337 avatar May 19 '24 16:05 ptr1337

Let's hope that when/if thin-lto gets introduced the bug won't be reintroduced again :crossed_fingers:

Closing issue

evelikov avatar Jun 04 '24 19:06 evelikov

Thanks for finding this issue and im really glad that this has been fixed. As distribution deploying thinlto kernels was a really bad experience for many users.

ptr1337 avatar Jun 04 '24 19:06 ptr1337