WLED icon indicating copy to clipboard operation
WLED copied to clipboard

Enable Link Time Optimisation

Open deece opened this issue 1 year ago • 17 comments

ref: https://gcc.gnu.org/wiki/LinkTimeOptimization

Link Time Optimisation defers final compilation until the link stage, which allows more aggressive inlining and optimisations to occur, as more information is known at link time vs when each object is compiled.

For the esp32dev platform, I have the following size reductions: Without LTO: 1313024 .pio/build/esp32dev/firmware.bin

With LTO: 1214256 .pio/build/esp32dev/firmware.bin

deece avatar Mar 15 '24 00:03 deece

Hi,

I've played with LTO some time ago, and actually ran into some issues:

  • it did not seem to be reliable / stable on all supported devices (especially -C3 had problems)
  • performance had dropped in some cases.
  • stack usage is increasing - this could become a problem for the async_tcp task (web server) where stack usage is already critical
  • stack traces (crash dumps) become unreadable when lto is enabled
  • no benefit on 8266

As there seem to be some drawbacks, it would be good if you make LTO optional.

softhack007 avatar Mar 15 '24 07:03 softhack007

This might be interesting, too https://github.com/platformio/platform-espressif32/pull/702

softhack007 avatar Mar 15 '24 07:03 softhack007

This might be interesting, too platformio/platform-espressif32#702

That's weird, that's the same change that I put a PR in for this morning. I wonder if it's been backed out or clobbered at some point.

[later] Ah, it was never submitted, then the OP pulled the patchset and presumably it was forgotten about.

deece avatar Mar 15 '24 07:03 deece

Another thing that I remember: the arduino-esp32 build scripts have an explicit "-fno-lto"

https://github.com/espressif/arduino-esp32/blob/75b7f4b646599426e92ebec913a4dce14ef528cd/tools/platformio-build-esp32.py#L84

so you might also need build_unflags = -fno-lto to remove the no-lto flag from the command line.

softhack007 avatar Mar 15 '24 08:03 softhack007

Another thing that I remember: the arduino-esp32 build scripts have an explicit "-fno-lto"

https://github.com/espressif/arduino-esp32/blob/75b7f4b646599426e92ebec913a4dce14ef528cd/tools/platformio-build-esp32.py#L84

so you might also need build_unflags = -fno-lto to remove the no-lto flag from the command line.

Interesting, that looks like it snuck in this commit, with no explanation: https://github.com/espressif/arduino-esp32/commit/5502879a5b25e5fff84a7058f448be481c0a1f73

I just added it to build_unflags and saved a little more: 1212336 .pio/build/esp32dev/firmware.bin

Do you have any references for the a loss in performance? (that should be repro'd with a current compiler & reported upstream) AFAIK, LTO eliminates call overhead and dead branches, which should only ever improve performance.

deece avatar Mar 15 '24 08:03 deece

Do you have any references for the a loss in performance? (that should be repro'd with a current compiler & reported upstream) AFAIK, LTO eliminates call overhead and dead branches, which should only ever improve performance

Nothing that's good for a test case, sorry. It was just something I recognized sporadically, when playing with lto and other optimization flags.

Actually this guy makes a similar statement: https://interrupt.memfault.com/blog/best-and-worst-gcc-clang-compiler-flags#-flto

The other problem is debugging and stack traces - these become very unreadable due to lto. However its possible to still get something usefull if compiling with -g3 -ggdb.

softhack007 avatar Mar 15 '24 08:03 softhack007

* it did not seem to be reliable / stable on all supported devices (especially -C3 had problems)

I just rolled a build and threw it onto a C3 I had spare, and there was no sign of the boot loop mentioned in the other thread. I'd put it down to a> ancient compiler and b> relatively new architecture.

It looks like the compiler we are currently using generates reasonable code for the C3. I was able to boot it any join it to my Wifi network. (there's no headers on it though so I can't test with a strip)

Do you have any suggestions on how to make things optional? I can't see a nice way to conditionally fiddle the flags.

deece avatar Mar 15 '24 08:03 deece

We are using LTO since over 1 year for Tasmota (except for the C3). Flash size usage reduction is nice. Yes, debugging is a pain when enabled. But LTO can be switched of when needed. We have ZERO issues with LTO. Tasmota is a stable as it was before. Not used with esp8266. With upcoming Arduino core 3.0.0 it is possible to use with the C3 too. Since newer toolchains used, the gcc compiler / linker bug regarding LTO is fixed. To be clear every espressif toolchain for riscv before gcc 12 produces faulty code for the c3 when enabling LTO!

And yes there are articles floating around, how bad it is to use LTO. Looking closely all issues arriving with enabling LTO are bad code design.

Jason2866 avatar Mar 17 '24 16:03 Jason2866

We are using LTO since over 1 year for Tasmota (except for the C3). Flash size usage reduction is nice. Yes, debugging is a pain when enabled. But LTO can be switched of when needed. We have ZERO issues with LTO. Tasmota is a stable as it was before. Not used with esp8266. With upcoming Arduino core 3.0.0 it is possible to use with the C3 too. Since newer toolchains used, the gcc compiler / linker bug regarding LTO is fixed. To be clear every espressif toolchain for riscv before gcc 12 produces faulty code for the c3 when enabling LTO!

@Jason2866 Is your recommendation to disable LTO for C3 for now? It looks like the build selects Espressif32 @ 5.3.0, which selects toolchain-riscv32-esp 8.4.0+2021r2-patch5.

deece avatar Mar 18 '24 10:03 deece

@deece Yes, disable LTO for the C3 when toolchain-riscv32-esp 8.4.0+2021r2-patch5 is used.

Jason2866 avatar Mar 18 '24 13:03 Jason2866

Do you have any suggestions on how to make things optional? I can't see a nice way to conditionally fiddle the flags.

I think the script that changes elf-ar into elf-gcc-ar could always be kept active. So people building their own firmware, and developers need a way to "opt-in" to link-time optimization.

You could achieve it with a new section like this

[Size_Flags]
build_flags =
  -flto ;; enable LTO
build_unflags =
  -fno-lto ;; remove no-LTO

and then in custom buildenvs, we could simply add ${Size_Flags.build_unflags} to build_unflags and ${Size_Flags.build_flags} to build_flags respectively, to opt-in for size optimization by lto - knowing that lto makes stackdumps unreadable.

softhack007 avatar Mar 18 '24 22:03 softhack007

@deece ... so instead of modifying the [common] section, it seems more appropriate to have the lto flags/unflags one level deeper, in these sections:

https://github.com/Aircoookie/WLED/blob/acf6736afdc579e0bb498b9111e2b8ff7cbaab63/platformio.ini#L230

https://github.com/Aircoookie/WLED/blob/acf6736afdc579e0bb498b9111e2b8ff7cbaab63/platformio.ini#L250

https://github.com/Aircoookie/WLED/blob/acf6736afdc579e0bb498b9111e2b8ff7cbaab63/platformio.ini#L269

https://github.com/Aircoookie/WLED/blob/acf6736afdc579e0bb498b9111e2b8ff7cbaab63/platformio.ini#L306

softhack007 avatar Mar 18 '24 22:03 softhack007

For Tasmota we enable LTO global for alle ESP32x and disable it individually in the C3 envs.

Jason2866 avatar Mar 19 '24 08:03 Jason2866

Thank you @Jason2866 for selflessly contributing your knowledge. ❤️

@softhack007 would it make sense to include LTO for every ESP32 except C3? We can make default build_flags for each of the environments IMO.

@deece could you update build_flags in each of the [esp32xx] environments accordingly if @softhack007 agrees? And perhaps make [esp32xx_debug] environments if needed?

BTW this is beyond my skill level so I cannot comment or review anything.

blazoncek avatar Mar 19 '24 13:03 blazoncek

Hi. I think it would be good to pursue this PR further as we are really at the limits of binary size for currently selected partitions.

Unfortunately my knowledge is limited and I cannot be of any help here.

@deece @softhack007 as you are more experienced could you work together (and if @Jason2866 can help) to make it happen?

IMO as @Jason2866 recommended we should enable LTO across all ESP32 except C3 (which only has 1 build environment) and add esp32 build flags to optionally disable it as @softhack007 suggested.

blazoncek avatar Apr 28 '24 07:04 blazoncek

I'm halfway through a revision, but have been sidetracked with more urgent duties. I'll get back to it soon

On 28 April 2024 5:49:24 pm AEST, "Blaž Kristan" @.***> wrote:

Hi. I think it would be good to pursue this PR further as we are really at the limits of binary size for currently selected partitions.

Unfortunately my knowledge is limited and I cannot be of any help here.

@deece @softhack007 as you are more experienced could you work together (and if @Jason2866 can help) to make it happen?

IMO as @Jason2866 recommended we should enable LTO across all ESP32 except C3 (which only has 1 build environment) and add esp32 build flags to optionally disable it as @softhack007 suggested.

-- Reply to this email directly or view it on GitHub: https://github.com/Aircoookie/WLED/pull/3824#issuecomment-2081377654 You are receiving this because you were mentioned.

Message ID: @.***> -- Alastair D'Silva mob: 0423 762 819

deece avatar Apr 28 '24 08:04 deece

Hey! This pull request has been open for quite some time without any new comments now. It will be closed automatically in a week if no further activity occurs. Thank you for contributing to WLED! ❤️

github-actions[bot] avatar Aug 29 '24 12:08 github-actions[bot]