gentooLTO icon indicating copy to clipboard operation
gentooLTO copied to clipboard

Building the Linux kernel using LTO

Open InBetweenNames opened this issue 7 years ago • 123 comments

I find it interesting that there hasn't been more push to build the kernel using LTO. I've found a couple of mailing list threads about it, including a patchset to let it happen, but there wasn't a lot of interest upstream. I've created this issue as a way to track what the current LTO progress in the kernel is, and possibly even add some patchsets to let it happen. I know I'd for sure use it on my router if I could with OpenWRT.

InBetweenNames avatar Jan 27 '18 16:01 InBetweenNames

@InBetweenNames I would use it on my router too, I think at the time the gcc LTO toolchain wasn't very mature and few were able too make much use of it, particularly embedded* Linux where there would be most interest. Without that buy-in the kernel devs weren't going to let the patches in.

Perhaps resurrecting the patch set and getting it working again could be successful now that lto support is pretty ubiquitous in distros and most embedded devs must be using it by now for their user space.

  • embedded toolchains tend to be quite conservative and stick around for a while

sjnewbury avatar Oct 22 '18 10:10 sjnewbury

Seems some remnants of those patches are still in the kernel (notably DISABLE_LTO so it doesn't use it for vdso), so I tried with 4.19.1. Formerly used scripts/gcc-ld but didn't work for me so I used gold. I doubt it's accomplishing anything built this way (size barely changed with other defaults). Despite using gcc-ar, was also complaining about the lto plugin unless -ffat-lto. Patchset used to use -fwhole-program too but that didn't work. Nonetheless, thought I'd do the crazy thing and build the kernel with:

make -j8 AR=gcc-ar NM=gcc-nm LD=ld.gold KCFLAGS="-march=native -O3 -falign-functions=32 -fipa-pta -fno-semantic-interposition -fgraphite-identity -floop-nest-optimize -flto=8 -ffat-lto-objects" DISABLE_LTO=-fno-lto

Which.. worked.. and booted fine. I am now the proud owner of a kernel that 30% bigger than before, probably not faster, and set out to kill my dog, but thankfully running in QEMU away from my dog. Edit: well, removing LTO with the same options does make it like 10% even bigger.

ionenwks avatar Nov 11 '18 03:11 ionenwks

It might be interesting to compare the speed of some syscall- / kernel-bound workloads when successfully built with LTO. Anyone with an idea on how to start benchmarking our gains or losses?

gcs-github avatar Nov 11 '18 13:11 gcs-github

Not sure, but if you check the kernel mailing list plenty of those benchmarks have been done in the past. I remember seeing pretty big gains with LTO, but not sure if those reflected into any gain for daily usage. Some more info about how to benchmark the kernel: https://github.com/graysky2/kernel_gcc_patch

darkbasic avatar Nov 11 '18 13:11 darkbasic

One thing about LTO is you have to build as many of your models into the kernel as possible... so it knows what it can eliminate when linking... so you get the biggest gains on a completely static kernel (this of course breaks somethings that load firmware etc... some of that you can work around by building in the blobs though).

cb88 avatar Nov 14 '18 03:11 cb88

Andi Kleen rebased his LTO patches for the Kernel on 4.20 recently. I've tried it out but had no luck and several module errors along the way. Nevertheless, you can find these patches here: https://github.com/andikleen/linux-misc/tree/lto-420-1

ms178 avatar Jan 08 '19 19:01 ms178

^ Didn't experiment much but gave it a quick try and it built fine for me with my configuration and CONFIG_LTO=y which auto-adds -flto -fno-fat-lto-objects. Didn't try a generic one and I use almost no modules which, as stated in the other above post, is better suited for a LTO kernel anyway.

Looks like it's using the gcc-ld script and working properly. I do have gold as my default linker (been using it even for kernel).

I imagine it may make more of a difference on a less-lean kernel, but my resulting 4.20 kernel is about 1% smaller than my old, didn't try to boot and also no idea for any performance gains.

ionenwks avatar Jan 09 '19 03:01 ionenwks

@ionenwks I'm trying to replicate the steps on a gentoo system to build an LTO'd kernel. However, I always error out on the linking portion: /usr/lib/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../x86_64-pc-linux-gnu/bin/ld: error: arch/x86/kernel/head_64.o: requires unsupported dynamic reloc 11; recompile with -fPIC I have added this flag to the base KBUILD_CFLAGS but to no effect. I also have ld.gold enabled by default. What version of GCC and binutils are you using? Did you make any configurations to the Makefile from Andi Kleen's repo?

Promaethius avatar Aug 11 '19 06:08 Promaethius

Hmm... I tried again both with the lto-420-1 branch from back then along with same configuration and the newer lto-5.1-3, and I'm getting the same errors as you now (using gcc 8.3.0 and ld.bfd 2.32).

Not sure what I was using back then but looking at the date I assume I was on gcc 8.2 and binutils 2.30 I think? It's only something I tried real quick, I had no intention to stick with that for now (or boot it).

Edit: Retried with gold as default (switched back to bfd a while ago), doesn't work either, not with current toolchain anyway. Edit2: And no, I hadn't made any changes, used as-is.

ionenwks avatar Aug 11 '19 08:08 ionenwks

@ionenwks thank you for taking the time to check through the issue! I was afraid it was a toolchain version issue, so I wonder if this is a reportable bug? I'm going to take some time today and check if its a gcc or binutils issue. Edit: I'm throwing some more configuration testing into this mess. Found this article over on the patch list: https://patchwork.kernel.org/patch/10000627/

Promaethius avatar Aug 11 '19 18:08 Promaethius

I was able to build 5.0-1 successfully, however I did not test it and the system it was on it now gone.

-fPIC would cause reloc .text errors if it was built with visibility=hidden or ssp(but the Makefile already filters that). Maybe -flinker-output=rel would make sense here, but I couldn't get the syntax correct. ~because parts of the kernel build are still static, and static objects aren't able to find PIC references~. If anyone knows his full patchset without a kernel tree that'd be really helpful.

jiblime avatar Aug 12 '19 00:08 jiblime

@jiblime You can find his patchset on the kernel mailing list but it won't really help: https://lkml.org/lkml/2017/11/27/1052 THIN_ARCHIVES was a config option that was removed in 4.19+. It went around the supposed issue of ld -r. But, I've narrowed it down to a ld issue of some sort. There are kernel patches that let you fPIC the code but they aren't working for me yet.

Promaethius avatar Aug 12 '19 02:08 Promaethius

@Promaethius Thanks for the link. I'm currently trying to edit arch/x86/entry/vdso/Makefile to work. At the very bottom you can try appending flags after ${LD} but nothing has worked for me, even the options to specifically suppress the error.

I went and checked a regular kernel and I noticed that it's normal(?) for a hidden symbol to be there.

Both comands ran were readelf vclock_gettime.o -s

5.1-3 LTO:

Symbol table '.symtab' contains 25 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS vclock_gettime.c
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1 
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    3 
     4: 0000000000000000     0 SECTION LOCAL  DEFAULT    4 
     5: 0000000000000000   174 FUNC    LOCAL  DEFAULT    1 do_hres
     6: 0000000000000000     0 SECTION LOCAL  DEFAULT    5 
     7: 0000000000000000     0 SECTION LOCAL  DEFAULT    7 
     8: 0000000000000000     0 SECTION LOCAL  DEFAULT    8 
     9: 0000000000000000     0 SECTION LOCAL  DEFAULT   10 
    10: 0000000000000000     0 SECTION LOCAL  DEFAULT   11 
    11: 0000000000000000     0 SECTION LOCAL  DEFAULT   12 
    12: 0000000000000000     0 SECTION LOCAL  DEFAULT   14 
    13: 0000000000000000     0 SECTION LOCAL  DEFAULT   15 
    14: 0000000000000000     0 SECTION LOCAL  DEFAULT   17 
    15: 0000000000000000     0 SECTION LOCAL  DEFAULT   19 
    16: 0000000000000000     0 SECTION LOCAL  DEFAULT   20 
    17: 0000000000000000     0 SECTION LOCAL  DEFAULT   18 
    18: 0000000000000000     0 NOTYPE  GLOBAL HIDDEN   UND vvar_vsyscall_gtod_data
    19: 00000000000000b0   111 FUNC    GLOBAL DEFAULT    1 __vdso_clock_gettime
    20: 00000000000000b0   111 FUNC    WEAK   DEFAULT    1 clock_gettime
    21: 0000000000000120    98 FUNC    GLOBAL DEFAULT    1 __vdso_gettimeofday
    22: 0000000000000120    98 FUNC    WEAK   DEFAULT    1 gettimeofday
    23: 0000000000000190    16 FUNC    GLOBAL DEFAULT    1 __vdso_time
    24: 0000000000000190    16 FUNC    WEAK   DEFAULT    1 time
readelf: Warning: compressed section '.debug_str' is corrupted

5.2.8 kernel:

Symbol table '.symtab' contains 27 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS vclock_gettime.c
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1 
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    3 
     4: 0000000000000000     0 SECTION LOCAL  DEFAULT    4 
     5: 0000000000000000   392 FUNC    LOCAL  DEFAULT    1 do_hres
     6: 0000000000000000     0 SECTION LOCAL  DEFAULT    5 
     7: 0000000000000000     0 SECTION LOCAL  DEFAULT    7 
     8: 0000000000000000     0 SECTION LOCAL  DEFAULT    8 
     9: 0000000000000000     0 SECTION LOCAL  DEFAULT   10 
    10: 0000000000000000     0 SECTION LOCAL  DEFAULT   11 
    11: 0000000000000000     0 SECTION LOCAL  DEFAULT   12 
    12: 0000000000000000     0 SECTION LOCAL  DEFAULT   14 
    13: 0000000000000000     0 SECTION LOCAL  DEFAULT   15 
    14: 0000000000000000     0 SECTION LOCAL  DEFAULT   17 
    15: 0000000000000000     0 SECTION LOCAL  DEFAULT   19 
    16: 0000000000000000     0 SECTION LOCAL  DEFAULT   20 
    17: 0000000000000000     0 SECTION LOCAL  DEFAULT   18 
    18: 0000000000000000     0 NOTYPE  GLOBAL HIDDEN   UND vvar_vsyscall_gtod_data
    19: 0000000000000000     0 NOTYPE  GLOBAL HIDDEN   UND hvclock_page
    20: 0000000000000000     0 NOTYPE  GLOBAL HIDDEN   UND pvclock_page
    21: 0000000000000190   102 FUNC    GLOBAL DEFAULT    1 __vdso_clock_gettime
    22: 0000000000000190   102 FUNC    WEAK   DEFAULT    1 clock_gettime
    23: 0000000000000200    98 FUNC    GLOBAL DEFAULT    1 __vdso_gettimeofday
    24: 0000000000000200    98 FUNC    WEAK   DEFAULT    1 gettimeofday
    25: 0000000000000270    16 FUNC    GLOBAL DEFAULT    1 __vdso_time
    26: 0000000000000270    16 FUNC    WEAK   DEFAULT    1 time

readelf: Warning: compressed section '.debug_str' is corrupted looks to be of interest. Does this mean there needs to be more debug information built in?

jiblime avatar Aug 12 '19 04:08 jiblime

@jiblime I found this on the gcc site today: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-fuse-linker-plugin-916

When a file is compiled with -flto without -fuse-linker-plugin, the generated object file is larger than a regular object file because it contains GIMPLE bytecodes and the usual final code (see -ffat-lto-objects. This means that object files with LTO information can be linked as normal object files; if -fno-lto is passed to the linker, no interprocedural optimizations are applied. Note that when -fno-fat-lto-objects is enabled the compile stage is faster but you cannot perform a regular, non-LTO link on them.

I've witnessed Andi Kleen's patchset passing -fno-fat-lto-objects without -fuse-linker-plugin. Will test this theory later today. This could explain why readelf is returning corruption, but pardon my ignorance if that's not the case.

Promaethius avatar Aug 12 '19 18:08 Promaethius

@Promaethius

I've witnessed Andi Kleen's patchset passing -fno-fat-lto-objects without -fuse-linker-plugin

That explains why he uses -fwhole-program and and -fipa-cp-clone, since collect2 would be used instead of a linker. I'm assuming he's doing that for compatibility, as GCC documentation claims it's likely to increase code size vs. bfd/gold. I wonder if GentooLTO would be able to do something better...

I believe it's a glibc issue. I've upgraded to sys-libs/glibc-2.30::gentoo and have been able to get past it. Currently recompiling since paravirtualization options, not sure which, causes it to error.

https://sourceware.org/ml/libc-alpha/2019-08/msg00029.html

* The dynamic linker no longer refuses to load objects which reference
  versioned symbols whose implementation has moved to a different soname
  since the object has been linked.  The old error message, symbol
  FUNCTION-NAME, version SYMBOL-VERSION not defined in file DSO-NAME with
  link time reference, is gone.

It emits a warning, I'm still not sure why since Andi Kleen filters LTO out of it from what I can tell.

Warnings emitted with V=2

CC arch/x86/entry/vdso/vdso32-setup.o - due to target missing LDS arch/x86/entry/vdso/vdso.lds - due to target missing AS arch/x86/entry/vdso/vdso-note.o - due to target missing CC arch/x86/entry/vdso/vclock_gettime.o - due to target missing In file included from ./arch/x86/include/asm/vgtod.h:5, from arch/x86/entry/vdso/vclock_gettime.c:15: arch/x86/entry/vdso/vclock_gettime.c: In function ‘do_hres’: ./include/linux/compiler.h:182:26: warning: array subscript 1 is outside array bounds of ‘u8[1]’ {aka ‘unsigned char[1]’} [-Warray-bounds] 182 | case 8: *(__u64 *)res = *(volatile __u64 *)p; break;
| ^~~~~~~~~~~~~~~~~~~~ ./include/linux/compiler.h:193:2: note: in expansion of macro ‘__READ_ONCE_SIZE’ 193 | __READ_ONCE_SIZE; | ^~~~~~~~~~~~~~~~ arch/x86/entry/vdso/vclock_gettime.c:37:11: note: while referencing ‘hvclock_page’ 37 | extern u8 hvclock_page | ^~~~~~~~~~~~ In file included from ./arch/x86/include/asm/vgtod.h:5, from arch/x86/entry/vdso/vclock_gettime.c:15: ./include/linux/compiler.h:182:26: warning: array subscript 2 is outside array bounds of ‘u8[1]’ {aka ‘unsigned char[1]’} [-Warray-bounds] 182 | case 8: *(__u64 *)res = *(volatile __u64 *)p; break;
| ^~~~~~~~~~~~~~~~~~~~ ./include/linux/compiler.h:193:2: note: in expansion of macro ‘__READ_ONCE_SIZE’ 193 | __READ_ONCE_SIZE; | ^~~~~~~~~~~~~~~~ arch/x86/entry/vdso/vclock_gettime.c:37:11: note: while referencing ‘hvclock_page’ 37 | extern u8 hvclock_page | ^~~~~~~~~~~~ CC arch/x86/entry/vdso/vgetcpu.o - due to target missing VDSO arch/x86/entry/vdso/vdso64.so.dbg - due to target missing OBJCOPY arch/x86/entry/vdso/vdso64.so - due to target missing HOSTCC arch/x86/entry/vdso/vdso2c - due to target missing VDSO2C arch/x86/entry/vdso/vdso-image-64.c - due to target missing CC arch/x86/entry/vdso/vdso-image-64.o - due to target missing LDS arch/x86/entry/vdso/vdso32/vdso32.lds - due to target missing CC arch/x86/entry/vdso/vdso32/vclock_gettime.o - due to target missing In file included from ./arch/x86/include/asm/vgtod.h:5, from arch/x86/entry/vdso/vdso32/../vclock_gettime.c:15, from arch/x86/entry/vdso/vdso32/vclock_gettime.c:31: arch/x86/entry/vdso/vdso32/../vclock_gettime.c: In function ‘do_hres’: ./include/linux/compiler.h:182:26: warning: array subscript 1 is outside array bounds of ‘u8[1]’ {aka ‘unsigned char[1]’} [-Warray-bounds] 182 | case 8: *(__u64 *)res = *(volatile __u64 *)p; break;
| ^~~~~~~~~~~~~~~~~~~~ ./include/linux/compiler.h:193:2: note: in expansion of macro ‘__READ_ONCE_SIZE’ 193 | __READ_ONCE_SIZE; | ^~~~~~~~~~~~~~~~ In file included from arch/x86/entry/vdso/vdso32/vclock_gettime.c:31: arch/x86/entry/vdso/vdso32/../vclock_gettime.c:37:11: note: while referencing ‘hvclock_page’ 37 | extern u8 hvclock_page | ^~~~~~~~~~~~ In file included from ./arch/x86/include/asm/vgtod.h:5, from arch/x86/entry/vdso/vdso32/../vclock_gettime.c:15, from arch/x86/entry/vdso/vdso32/vclock_gettime.c:31: ./include/linux/compiler.h:182:26: warning: array subscript 2 is outside array bounds of ‘u8[1]’ {aka ‘unsigned char[1]’} [-Warray-bounds] 182 | case 8: *(__u64 *)res = *(volatile __u64 *)p; break;
| ^~~~~~~~~~~~~~~~~~~~ ./include/linux/compiler.h:193:2: note: in expansion of macro ‘__READ_ONCE_SIZE’ 193 | __READ_ONCE_SIZE; | ^~~~~~~~~~~~~~~~ In file included from arch/x86/entry/vdso/vdso32/vclock_gettime.c:31: arch/x86/entry/vdso/vdso32/../vclock_gettime.c:37:11: note: while referencing ‘hvclock_page’ 37 | extern u8 hvclock_page | ^~~~~~~~~~~~ AS arch/x86/entry/vdso/vdso32/note.o - due to target missing AS arch/x86/entry/vdso/vdso32/system_call.o - due to target missing AS arch/x86/entry/vdso/vdso32/sigreturn.o - due to target missing VDSO arch/x86/entry/vdso/vdso32.so.dbg - due to target missing OBJCOPY arch/x86/entry/vdso/vdso32.so - due to target missing VDSO2C arch/x86/entry/vdso/vdso-image-32.c - due to target missing CC arch/x86/entry/vdso/vdso-image-32.o - due to target missing

So as I understand, it would be a huge issue to have a textrel in a/the vdso because it'd be a vulnerability in a security feature. Gentoo's wiki actually has a guide on finding and fixing textrels: https://wiki.gentoo.org/wiki/Hardened/Textrels_Guide

But hopefully there's no need to recreate anything. While the vdso*.so files have a textrel flag marked on them, scanelf -T shows that there isn't anything that would point to it.

Glibc 2.29, GCC 9.1.0

 TYPE    PAX   PERM ENDIAN STK/REL/PTL TEXTREL RPATH BIND TEXTRELS FILE 
scanelf: scanelf_file_textrels(): ELF is missing relocation information
scanelf: scanelf_file_textrels(): ELF vdso32.so has TEXTREL markings but doesnt appear to have any real TEXTREL's !?
ET_DYN PeMRxS 0755 LE --- --- R-X TEXTREL   -   LAZY  vdso32.so

It did also emit this, though:

arch/x86/kernel/dumpstack.o: warning: objtool: show_regs.cold()+0x16: sibling call from callable instruction with modified stack frame
arch/x86/kernel/dumpstack.o: warning: objtool: show_regs()+0x0: stack state mismatch: cfa1=7+24 cfa2=7+8

So it looks like it can be possible, but definitely experimental and not a daily driver for myself. I'm going to be grabbing GCC 9.2 now so I won't be getting to it anytime soon (btw, I added 20G of swap with -j5 and it still failed, dammit), but if Glibc 2.30 is the fix, I think it'd be worth a shot to try using this kernel for testing.

If you were to use a linker instead of collect2 you can run replace -fwhole-program with fuse-linker-plugin in scripts/Makefile.lto as Gnu documentation states it's best not to use the former with the latter. Optimizations that would also help LTO specifically would be -fdevirtualize-at-ltrans and the -fgraphite-identity -floop-nest-optimize options. I've used these along other flags to compile and run my kernel, but if the linking stage is too much the process will overflow and it'll end.

What's interesting is that his newest version (as far as I can tell) lacks explicit linker usage but his older versions use -fuse-linker-plugin. So I could be wrong in assuming that removing -fwhole-program is the right way to go.

jiblime avatar Aug 13 '19 14:08 jiblime

Andi Kleen's lto-5.7-2 branch branch builds and I am currently running it. I've applied the 5.7.14 patch, Gentoo distro patches, and a few other misc. patches with no rejects.

Notes:

  • nouveau does not build due to the command to building it being too long for shell

  • I wasn't able to load any modules, not when booted in or through my initrd. Used make mod2yesconfig to convert modules to being built in. [Correction 1 below]

  • ./scripts/Makefile.lto contains the settings for how LTO is done upon the kernel. I appended -flto-compression-level=9 to LTO_CFLAGS

  • In the same file, TMPDIR sets the building directory in the kernel directory instead of /tmp to prevent OOM. I copied the kernel to /var/tmp/portage instead because there are massive writes on disk during linking.

  • The primary demographic of LTO'd kernels seems to be embedded systems

The size of my LTO'd kernel is 22M, modules folder is 800K. Vs. my normal kernel at 11M and modules folder at 71M

  • It feels fast, that counts

Semi-related:

GCC 10's -O2 might be slightly slower than GCC 9's -O2

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337#c15

Inliner changes was not targetting to make compile time faster and
compiled code slower. It was intended to reflect more closely modern C++
codebases and get faster binaries (at -O2 and -O2 -flto) without
regressing in code sizes.  In fact more inlining happens and thus we
needed to optimize inliner code carefully to avoid regressions with LTO.

If you have a -march=znver1/znver2 processor and run x86_64 multilib, rebuilding the current GCC 10.2.0 would mean a nice performance boost with this patch:

patch 1, patch 2

Refer to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95435


Correction 1: I incorrectly assumed modules weren't supported with -flto. While building everything into the kernel alleviated the issues, namely framebuffer and Logitech USB support, kernel compilation time was too long and I prefer being able to reload modules. The likely culprit in module failure was TRIM_UNUSED_KSYMS and possibly dracut defaulting to --strip the generated initrd; can't say for certain yet. I didn't get around to testing it enough but now I am able to load amdgpu in my initrd as usual instead of compiling it in.

jiblime avatar Aug 08 '20 01:08 jiblime

* It feels fast, that counts

Can you describe in what way?

Cheers for the gcc links too

telans avatar Aug 08 '20 02:08 telans

oooh, imma test

barolo avatar Aug 08 '20 11:08 barolo

@jiblime Could you list the patches applied? All are from gentoo's ebuild?

barolo avatar Aug 08 '20 11:08 barolo

@jiblime Could you list the patches applied? All are from gentoo's ebuild?

I haven't built it yet, but this patch applies fine to gentoo-sources-5.8 (just a diff from the lto-5.8.0-1 branch)

https://gist.github.com/telans/728b63dd07c41c9ca6e2ca3d4431db8e


Doesn't build for me unfortunately, lots of:

/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../x86_64-pc-linux-gnu/bin/ld: ./.tmp_vmlinux.kallsyms1.mJXteD.ltrans123.ltrans.o: relocation R_X86_64_32S against `.data' can not be used when making a PIE object; recompile with -fPIE

telans avatar Aug 08 '20 11:08 telans

also, there's no 5.7.14 patch

https://cdn.kernel.org/pub/linux/kernel/v5.x/patch-5.7.14.xz

telans avatar Aug 08 '20 12:08 telans

that patch is already applied... nvm, messed up something, ended with upstream master somehow... this will impact my sdd most def

barolo avatar Aug 08 '20 12:08 barolo

@telans No problem. The first thing I noticed was my dmesg timestamps were lower than usual :p ideally I'll set up a phoronix benchmark to have actual data.

relocation R_X86_64_32S

Are you using ld.gold as your default linker? The Linux kernel needs either GCC/ld.bfd or Clang/ld.lld. https://github.com/InBetweenNames/gentooLTO/issues/338

sys-devel/gcc-10.2.0::gentoo was built with the following:
USE="(cxx) fortran graphite lto (multilib) nls nptl objc openmp pch pgo sanitize ssp zstd (-ada) -d -debug -doc (-fixed-point) -go (-hardened) -jit (-libssp) -objc++ -objc-gc -pie -systemtap -test -vanilla -vtv" ABI_X86="(64)"
sys-devel/binutils-2.34-r2::gentoo was built with the following:
USE="gold multitarget nls plugins static-libs -default-gold -doc -test" ABI_X86="(64)"

@barolo https://github.com/jiblime/linux-misc/commits/lto-5.7-prjc-r3 You can pull the patches from here or clone the single branch and build off that. The CPPC patch doesn't work for me, so I leave it off just in case it would case me to fail to boot. It's a bit messy, I'm still not the greatest at making clean commits. I chose the 5.7-2 branch instead of 5.8 because I wanted to try the Project C scheduler (previously named BMQ, now abbreviated prjc). I'll try the 5.8 branch sometime.

I generally download a vanilla tarball from kernel.org (v5.7, v5.8, etc) and apply the Gentoo patches and incremental patches afterwards. That way I don't have to worry about rejected patches as often

jiblime avatar Aug 08 '20 14:08 jiblime

@jiblime thanks for the branch, made it much easier for me. Compiling

barolo avatar Aug 08 '20 16:08 barolo

compiled almost cleanly for me, didn't take that long too, had a bunch of "-Wstringop-overflow" warnings for Bluetooth module. Didn't boot for me with error related to scsi. With modules builtin it is 20M , modules dir i 1M I have nvme and amdgpu on that box, gonna try to strip it a bit more

barolo avatar Aug 08 '20 17:08 barolo

Narrowed it down, hidpp/logitech's stuff makes it crash, and it doesn't switch to amdgpu output @jiblime it seems like you\ve had similar issues, how did you solve them? Edit. Cleaned it a bit, built amdgpu, bluetooth, and logitech hidpp as modules, the remaining issue seems to be that framebuffer isn't being switched during boot

barolo avatar Aug 08 '20 18:08 barolo

Are you using ld.gold as your default linker? The Linux kernel needs either GCC/ld.bfd or Clang/ld.lld.

Nope, using ld.bfd ( or at least I haven't changed it.)

sys-devel/gcc-10.2.0::gentoo was built with the following:
USE="(cxx) fortran graphite lto (multilib) nls nptl openmp pch pgo (pie) sanitize ssp vtv zstd (-ada) -d -debug -doc (-fixed-point) -go (-hardened) (-jit) (-libssp) -objc -objc++ -objc-gc -systemtap -test -vanilla" ABI_X86="(64)"
sys-devel/binutils-2.34-r2::gentoo was built with the following:
USE="gold nls plugins -default-gold -doc -multitarget -static-libs -test" ABI_X86="(64)"

Forcing LD=ld.bfd doesn't change anything either. I thought it might have been an issue with ripping a patch from the lto-5.8-1 branch, however, the branch too builds with the same relocation errors


Same issue with lto-5.7-2

telans avatar Aug 08 '20 21:08 telans

Update, managed to run it and reach the desktop. The issue was with building all modules in. So I took my working config as base, used genkernel and made sure that it runs without LTO enabled first, then enabled LTO and booted into desktop successfully. Ended with a bunch of drivers disabled, most importantly for network and sata, luckily my main is a pcie one. Each failed module had disagrees about version of symbol module_layout in dmesg, gonna investigate it now.

Edit. It seems that all of those are modules that weren't built in, so it seems that initramfs isn't working for me Edit2. I'm typing from it, had to recompile it cleanly, cleaned it a bit and built some stuff in, module loading doesn't seem to work as I still got two of those disagrees... warnings

Can't really compare it yet, since it seems to use diff schedulers than I had with zen kernel, and spends more time at lower frequencies, would have to bench it properly to test it seriously.

I can already tell though that building that kernel is significantly faster under it

barolo avatar Aug 09 '20 09:08 barolo

My gut tells me it has something to do with the -fPIE flag

On Sun, Aug 9, 2020, 3:27 AM Greg Shuiske [email protected] wrote:

Update, managed to run it and reach the desktop. The issue was with building all modules in. So I took my working config as base, used genkernel and made sure that it runs without LTO enabled first, then enabled LTO. Ended with a bunch of drivers disabled, most importantly for network and sata, luckily my main is a pcie one. Each module had disagrees about version of symbol module_layout in dmesg, gonna investigate it now.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/InBetweenNames/gentooLTO/issues/90#issuecomment-671029236, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFFNN3MKRMCYOAXWIEI7BLR7ZTWVANCNFSM4EN5L3PQ .

Promaethius avatar Aug 09 '20 15:08 Promaethius

@Promaethius I've solved that by having those with warnings changed to built-in, It's running fine so far, gonna bench it with something now. My whole kernel with inbuilt stuff is 10 MB, with useless 4MB initramfs, for gaming desktop

barolo avatar Aug 09 '20 15:08 barolo