fluent-bit icon indicating copy to clipboard operation
fluent-bit copied to clipboard

build: Enable LTO for release builds

Open Garfield96 opened this issue 3 years ago • 3 comments

Signed-off-by: Christian Menges [email protected]

The CMake version upgrade (#6012) allows to enable LTO for release builds. As a result, the performance improves, as described in #4304. A downside of using LTO is an increased binary size as well as longer link times. However, in case of fluent-bit the binary bloat is neglectable (1-2Mb) and the increase of build time is only relevant for release builds. If link times become an issue in the future, we can consider to switch to the Mold linker, which is a modern and faster alternative to ld, ldd and gold.

Fixes: #4304

Enter [N/A] in the box, if an item is not applicable to your change.

Testing Before we can approve your change; please submit the following in a comment:

  • [N/A] Example configuration file for the change
  • [N/A] Debug log output from testing the change
  • [N/A] Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • [ ] Attached local packaging test output showing all targets (including any new ones) build.

Documentation

  • [ ] Documentation required for this feature

Backporting

  • [N/A] Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Garfield96 avatar Sep 21 '22 09:09 Garfield96

Looks like we need to disable release mode for CentOS 7 compilation checks (although the reason for them is to confirm the code will compile and build) or extend the timeout for this action on every PR as it is taking > 30 minutes now.

  • https://github.com/fluent/fluent-bit/blob/2ee2d42130f214507ccf0f4f1539d253e2676c42/dockerfiles/Dockerfile.centos7#L19
  • https://github.com/fluent/fluent-bit/blob/2ee2d42130f214507ccf0f4f1539d253e2676c42/.github/workflows/pr-compile-check.yaml#L16

Extending the timeout is probably best although we should also cancel all in progress jobs as part of it.

Note that I would expect this PR requires a full build of all targets before it is merged: https://github.com/fluent/fluent-bit/blob/master/packaging/local-build-all.sh We need to confirm it builds across the board for all supported platforms including the raspbian ones rather than just the unit test Ubuntu target.

patrick-stephens avatar Sep 21 '22 12:09 patrick-stephens

I did run local-build-all.sh and found the following issues:

  • CMake complains about a misconfiguration in lib/cfl/src/CMakeLists.txt:24 for some builds, like amazonlinux/2:
    CMake Error at lib/cfl/src/CMakeLists.txt:24 (install):
       install TARGETS given no ARCHIVE DESTINATION for static library target
       "cfl-static".
    
    This can be fixed by replacing (line 27)
      ARCHIVE DESTINATION ${CFL_INSTALL_LIBDIR}   
    
    with
      ARCHIVE DESTINATION lib ${CFL_INSTALL_LIBDIR}   
    
    However, I'm not sure whether this is a good solution.
  • Ubuntu 16.04 and 18.04 are broken with LTO enabled. It seems that gcc versions below v8 either doesn't support LTO or are buggy. In case of 18.04 the issue could be solved by using gcc-8, which is available as package. For Ubuntu 16.04, there are no official packages for gcc v8+, therefore I disabled LTO for gcc versions below 8.

Unfortunately, I can't run ARM builds, because the setup I have available does not support it.

The behavior of CentOS 7 in the CI is strange. I couldn't reproduce such long build times. In other PRs, this CI run took around 6 minutes (e.g. https://github.com/fluent/fluent-bit/actions/runs/3082753214/jobs/4982838372). Therefore, I'm not sure whether this timeout was caused by the change or some problem with the CI runner.

Garfield96 avatar Sep 22 '22 11:09 Garfield96

Thanks @Garfield96, yeah the CFL issue is already resolved but we've not updated the dependency here so that's a known problem I won't blame this PR for! :) https://github.com/fluent/cfl/pull/18

patrick-stephens avatar Sep 22 '22 11:09 patrick-stephens

@patrick-stephens The CentOS 7 compile test is back to ~7 minutes execution time. Hence, this PR is ready for review/merge.

Garfield96 avatar Sep 26 '22 18:09 Garfield96

@niedbalski I would highly appreciate it if you could have a look at this PR. Thanks.

Garfield96 avatar Oct 29 '22 09:10 Garfield96

@niedbalski I would highly appreciate it if you could have a look at this PR. Thanks.

@Garfield96 I'll ask @niedbalski once he is back from holiday to take a look.

patrick-stephens avatar Oct 31 '22 12:10 patrick-stephens