python-build-standalone
python-build-standalone copied to clipboard
Unable to compile cpython-3.x with optimizations on Ubuntu
Summary
I've tried to build Python on both of the Ubuntu platforms listed below, and the process always fails on cpython. Over several iterations on the Ubuntu server VM:
- Tried to compile with flags
--python=cypthon-3.8and--optimizations=pgo+lto, but failed (gist). - Tried to compile with flags
--python=cypthon-3.7and--optimizations=pgo+lto, but failed (gist). - Tried to compile with flags
--python=cypthon-3.7and--optimizations=pgo, but failed (gist). - Tried to compile with the
--python=cpython-3.7flag but no optimizations and succeeded! (download and slice off the.zipfrom the end, sha256 isf994d12161adcf3ced5ad2daf964cab6e8f70913cc90534ee3ac1feb41420404). - Tried to compile with the
--python=cpython-3.8flag but no optimizations and succeeded! (download and slice off the.zipfrom the end, sha256 is9743d8927e82503bd464bf645014ade886a01f29a8d20999081d4acb95050f41)
With this evidence, I'd have to say that @indygreg is probably right about musl not liking optimization.
Host Information
| Platform | Laptop |
|---|---|
| Notes | Default development machine |
| System Model | Purism Librem 15v3 w/ TPM |
| Operating System | Ubuntu 18.04 LTS x86_64 |
| Kernel | 5.3.0-51-generic #44~18.04.2-Ubuntu SMP Thu Apr 23 14:27:18 UTC 2020 |
| CPU | Intel Core i7-6500U @ 2.5GHz (x4) |
| RAM | 8GB |
| Platform | Tower PC |
|---|---|
| Notes | Used only to host the Ubuntu server VM, via VirtualBox v6.1.6 r137129 (Qt5.6.2); not used for building itself! |
| System Model | Beefy (custom) build |
| Operating System | Windows 10 Pro 64-bit, v1909 (build 18363.836) |
| CPU | AMD Ryzen 9 3900X (x12) |
| RAM | 32GB |
| Platform | Ubuntu server (virtualized) |
|---|---|
| Notes | AMD-V hypervisor, Nested paging, PAE/NX, and KVM paravirtualization are all enabled |
| Operating System | Ubuntu Server 20.04 LTS x86_64 |
| Kervel | 5.4.0-29-generic #33-Ubuntu SMP Wed Apr 29 14:32:27 UTC 2020 |
| CPU | AMD Ryzen 9 3900X (x12) |
| RAM | 16GB |
The underlying failure seems to be:
cpython-3.7> /tools/host/bin/ld: /tools/clang-linux64/lib/clang/10.0.0/lib/linux/libclang_rt.profile-x86_64.a(InstrProfilingFile.c.o): in function `parseAndSetFilename':
cpython-3.7> InstrProfilingFile.c:(.text.parseAndSetFilename+0x/tools/host/bin/ld: fe): undefined reference to `__strdup'
cpython-3.7> /tools/clang-linux64/lib/clang/10.0.0/lib/linux/libclang_rt.profile-x86_64.a(InstrProfilingFile.c.o): in function `parseAndSetFilename':
cpython-3.7> InstrProfilingFile.c:(.text.parseAndSetFilename+0xfe): undefined reference to `__strdup'
What I think is happening here is that Clang needs to link some profiling code into the instrumented binary so that it can emit profile metrics when executed. Because Clang was built with glibc (instead of musl), that profiling code is expecting an environment that doesn't exist because we're using musl.
A possible workaround would be to build a version of Clang (potentially just libclang_rt.profile-x86_64) against musl so the static library can be linked into binaries built against musl.
If you want some optimizations for musl, I'm optimistic the lto optimizations would work: it's just pgo that needs to inject code into an instrumented binary.
I'm no expert on this, believe me -- but wouldn't link-time optimizations for a static binary be superfluous? I'm assuming I'm wrong on that now, but not sure how.
No, LTO does more.
When you compile something, you first produce individual object files (from individual sources). Then once you have all of those, you link them together.
The compiler applies optimizations at compile time. These are typically the extent of optimizations. Traditionally when you link, the linker effectively assembles a bunch of already compiled/optimized code: it doesn't modify the generated machine code except to remove unused symbols, adjust memory addresses, etc.
Link-time optimization applies an additional round of optimizations at linking time. e.g. it can see function calls across object files and optimize accordingly. https://llvm.org/docs/LinkTimeOptimization.html has a very high-level overview. It might be best to think of LTO as whole-program optimizations and regular compiler optimization as single-file (technically compilation unit) optimization.
Oh alright, rad. Thanks for the link! I'll be sure to read up on the subject. In my tests I'd been disregarding LTO early on based on my misunderstanding.