python-build-standalone icon indicating copy to clipboard operation
python-build-standalone copied to clipboard

Unable to compile cpython-3.x with optimizations on Ubuntu

Open jwarner112 opened this issue 5 years ago • 4 comments
trafficstars

Summary

I've tried to build Python on both of the Ubuntu platforms listed below, and the process always fails on cpython. Over several iterations on the Ubuntu server VM:

  • Tried to compile with flags --python=cypthon-3.8 and --optimizations=pgo+lto, but failed (gist).
  • Tried to compile with flags --python=cypthon-3.7 and --optimizations=pgo+lto, but failed (gist).
  • Tried to compile with flags --python=cypthon-3.7 and --optimizations=pgo, but failed (gist).
  • Tried to compile with the --python=cpython-3.7 flag but no optimizations and succeeded! (download and slice off the .zip from the end, sha256 is f994d12161adcf3ced5ad2daf964cab6e8f70913cc90534ee3ac1feb41420404).
  • Tried to compile with the --python=cpython-3.8 flag but no optimizations and succeeded! (download and slice off the .zip from the end, sha256 is 9743d8927e82503bd464bf645014ade886a01f29a8d20999081d4acb95050f41)

With this evidence, I'd have to say that @indygreg is probably right about musl not liking optimization.

Host Information

Platform Laptop
Notes Default development machine
System Model Purism Librem 15v3 w/ TPM
Operating System Ubuntu 18.04 LTS x86_64
Kernel 5.3.0-51-generic #44~18.04.2-Ubuntu SMP Thu Apr 23 14:27:18 UTC 2020
CPU Intel Core i7-6500U @ 2.5GHz (x4)
RAM 8GB
Platform Tower PC
Notes Used only to host the Ubuntu server VM, via VirtualBox v6.1.6 r137129 (Qt5.6.2); not used for building itself!
System Model Beefy (custom) build
Operating System Windows 10 Pro 64-bit, v1909 (build 18363.836)
CPU AMD Ryzen 9 3900X (x12)
RAM 32GB
Platform Ubuntu server (virtualized)
Notes AMD-V hypervisor, Nested paging, PAE/NX, and KVM paravirtualization are all enabled
Operating System Ubuntu Server 20.04 LTS x86_64
Kervel 5.4.0-29-generic #33-Ubuntu SMP Wed Apr 29 14:32:27 UTC 2020
CPU AMD Ryzen 9 3900X (x12)
RAM 16GB

jwarner112 avatar May 16 '20 04:05 jwarner112

The underlying failure seems to be:

cpython-3.7> /tools/host/bin/ld: /tools/clang-linux64/lib/clang/10.0.0/lib/linux/libclang_rt.profile-x86_64.a(InstrProfilingFile.c.o): in function `parseAndSetFilename':
cpython-3.7> InstrProfilingFile.c:(.text.parseAndSetFilename+0x/tools/host/bin/ld: fe): undefined reference to `__strdup'
cpython-3.7> /tools/clang-linux64/lib/clang/10.0.0/lib/linux/libclang_rt.profile-x86_64.a(InstrProfilingFile.c.o): in function `parseAndSetFilename':
cpython-3.7> InstrProfilingFile.c:(.text.parseAndSetFilename+0xfe): undefined reference to `__strdup'

What I think is happening here is that Clang needs to link some profiling code into the instrumented binary so that it can emit profile metrics when executed. Because Clang was built with glibc (instead of musl), that profiling code is expecting an environment that doesn't exist because we're using musl.

A possible workaround would be to build a version of Clang (potentially just libclang_rt.profile-x86_64) against musl so the static library can be linked into binaries built against musl.

If you want some optimizations for musl, I'm optimistic the lto optimizations would work: it's just pgo that needs to inject code into an instrumented binary.

indygreg avatar May 17 '20 02:05 indygreg

I'm no expert on this, believe me -- but wouldn't link-time optimizations for a static binary be superfluous? I'm assuming I'm wrong on that now, but not sure how.

jwarner112 avatar May 17 '20 03:05 jwarner112

No, LTO does more.

When you compile something, you first produce individual object files (from individual sources). Then once you have all of those, you link them together.

The compiler applies optimizations at compile time. These are typically the extent of optimizations. Traditionally when you link, the linker effectively assembles a bunch of already compiled/optimized code: it doesn't modify the generated machine code except to remove unused symbols, adjust memory addresses, etc.

Link-time optimization applies an additional round of optimizations at linking time. e.g. it can see function calls across object files and optimize accordingly. https://llvm.org/docs/LinkTimeOptimization.html has a very high-level overview. It might be best to think of LTO as whole-program optimizations and regular compiler optimization as single-file (technically compilation unit) optimization.

indygreg avatar May 17 '20 04:05 indygreg

Oh alright, rad. Thanks for the link! I'll be sure to read up on the subject. In my tests I'd been disregarding LTO early on based on my misunderstanding.

jwarner112 avatar May 17 '20 07:05 jwarner112