Strip debug symbols of wheels
I was just made aware by @mattip that some Python distributions have -g in sysconfig.get_config_vars('CFLAGS'), and thus include debug symbols (including the versions in the manylinux images, it seems). The reason for this apparently is that these are then stripped before packed in the Debian/Fedora/... package managers (and often the symbols themselves are added as separate package for debugging).
The thing about building wheels is that these sysconfig are used when building wheels. So should we somehow strip symbols as well, or add -Wl,-strip-all to the build flags, or ... ?
See also numpy/numpy#16110
xref MacPython/numpy-wheels#82. FWIW, multibuild uses -Wl,-strip-all by default
It's not something I'd like cibuildwheel to do by default - for small C extensions, including debug symbols is very nice so that users can supply crash reports. But it might be nice to document it somewhere, for example in the Tips and Tricks section of the docs "Why are my wheels so big?"
That could also work, ofc. If we then document that adding -Wl,-strip-all does the job, that would help.
On the other hand, wouldn't stripping them be a sensible default? By default Python won't print any C stack trace, and it would be reasonably hard for a typical Python users to get these (going through gdb would then be the way to go, no, or is there a simpler way?).
The problem not stripping by default is that (almost) no one reads the docs unless there's a problem (like missing debug symbols).
The problem not stripping by default is that (almost) no one reads the docs unless there's a problem (like missing debug symbols).
That's true, but I'm not sure it's that big of a problem. We don't minify our Python code, even though it could save loads of space/bandwidth. I guess it's a philosophy thing for me - software should be open and hackable by default, especially in open source. But definitely document, as it could be really handy for some projects :)
This comment suggests using --strip-debug as a compromise between information and size.
For what it's worth, I've tried to compile with multiple settings, and run strip (with or without --strip-debug):
debug.so 91M
debug_strip-debug.so 42M
debug_strip.so 36M
minsizerel.so 28M
minsizerel_strip.so 28M
release.so 31M
release_strip.so 31M
relwithdebinfo.so 91M
relwithdebinfo_strip-debug.so 32M
relwithdebinfo_strip.so 29M
and if you zip them, as they will be in a wheel (ratios seems to stay approximately the same):
debug.zip 28M
debug_strip-debug.zip 11M
debug_strip.zip 9.2M
minsizerel.zip 7.8M
minsizerel_strip.zip 7.8M
release.zip 9.2M
release_strip.zip 9.2M
relwithdebinfo.zip 28M
relwithdebinfo_strip-debug.zip 9.0M
relwithdebinfo_strip.zip 8.5M
These are CMake build types, so:
- Debug:
-g - Release:
-O3 -DNDEBUG - RelWithDebInfo:
-g -O2 -DNDEBUG - MinSizeRel:
-Os -DNDEBUG
I don't know how representative my project is (there's an enormous code base I'm wrapping that isn't mine but that's +- normal C/C++, but there's also lots of template instantiations coming from pybind11 that will result in long names, I suppose), but stripping symbols results in approximately a third of the size for builds with -g.
That's true, but I'm not sure it's that big of a problem. We don't minify our Python code, even though it could save loads of space/bandwidth. I guess it's a philosophy thing for me - software should be open and hackable by default, especially in open source. But definitely document, as it could be really handy for some projects :)
While what you say makes sense, I slightly disagree in this particular context. I agree with the part that software should be open and hackable by default, but built binaries - well, need not be. Considering the fact that most people would use cibuildwheels as the final step for releasing software, I would say stripping out the debug symbols by default would be a wiser choice (at least in my opinion). Or alternatively, there should be an easy option to configure it.
I guess... I agree that there's some debate to be had here. cibuildwheel doesn't have a position on this though - Python (via sysconfig) is setting some defaults for CFLAGS to enable this behaviour. I'm not sure I'd want to add code into cibuildwheel to override that - it could get confusing to users where it's coming from.
there should be an easy option to configure it
Module authors have full control over how their extensions compile through setup.py. If you want to strip these symbols, I believe you can do:
setup(
ext_modules=[
Extension('_foo', ['foo.c'], extra_compile_args=['-g0'])
],
)
Refs: https://docs.python.org/3/distutils/setupscript.html#other-options https://clang.llvm.org/docs/UsersManual.html#cmdoption-g0
If somebody can confirm the above syntax that'd be great! Or if there's a better way, let me know. Then we can add some documentation showing how best to do this.
I just tried out adding the -g0 on one of my projects, it worked out well. (My ~13MB wheels are now ~1.2MB, and when unzipped, that's a reduction from around 52MB to 2MB :sweat_smile:). So I agree the user still has the full control. Maybe just to continue to debate though - one other point to consider would be that since cibuildwheel aims to be easy to use alternative to the more customizable multibuild - (I think) a lot of people who are maybe not well versed with C/C++ and setup.py (like me) might want to use it - and expect cibuildwheel to handle and apply the best practices for building wheels for them. So if not automating, at least some documentation which might guide them towards it might help. Again - just extending the argument on this perspective - probably you can make the right call.
It's not something I'd like cibuildwheel to do by default - for small C extensions, including debug symbols is very nice so that users can supply crash reports.
It doesn't make any sense if extension is linked against the release edition of Python's run-time library, does it?
It's not something I'd like cibuildwheel to do by default - for small C extensions, including debug symbols is very nice so that users can supply crash reports.
It doesn't make any sense if extension is linked against the release edition of Python's run-time library, does it?
Since python 3.8 there is the same ABI for debug and normal build: https://docs.python.org/3/whatsnew/3.8.html#debug-build-uses-the-same-abi-as-release-build
@cher-nov You still get full names of the functions and methods of the extension module in the strack traces, though. So often, to locate a bug, that's more than enough, and you don't need a debug build of Python itself.
If we had a "tutorial" page, this might somehow make it in, otherwise, it should just be an entry in FAQ? Pybind11's setup helpers add this by default. https://github.com/pybind/pybind11/blob/721834b422482a522abd4e83f11d545ef876f997/pybind11/setup_helpers.py#L146
Yes, an entry in the FAQ would be good, @henryiii. I guess it would be CIBW_ENVIRONMENT: CFLAGS=-g0?
Hello,
FYI I'm working on this issue in Psycopg 3: https://github.com/psycopg/psycopg/issues/142
I think cython libraries are big offenders, because the names generated are massive. Some stats our side:
Worst offender in psycopg 3.0.2 is the x86-64 Python 3.8 wheel package. Stripping our .so files the download size shrunk 33%:
$ ls -l */psycopg_binary-3.0.2-cp38-cp38-manylinux_2_24_x86_64.whl
-rw-rw-r-- 1 piro piro 6340205 Nov 8 14:32 tmp/psycopg_binary-3.0.2-cp38-cp38-manylinux_2_24_x86_64.whl
-rw-rw-r-- 1 piro piro 4275873 Nov 8 18:45 tmpstrip/psycopg_binary-3.0.2-cp38-cp38-manylinux_2_24_x86_64.whl
Unpacked footprint of all the libs installed shrunk 60%:
$ for f in */psycopg_binary-3.0.2-cp38-cp38-manylinux_2_24_x86_64.whl; do unzip -l $f; done | grep files
28606803 30 files
11346257 30 files
Footprint of the psycopg binaries alone shrunk 88%:
$ for f in */psycopg_binary-3.0.2-cp38-cp38-manylinux_2_24_x86_64.whl; do unzip -l $f; done | grep so$
15737560 2021-11-08 14:32 psycopg_binary/_psycopg.cpython-38-x86_64-linux-gnu.so
2978856 2021-11-08 14:32 psycopg_binary/pq.cpython-38-x86_64-linux-gnu.so
1139416 2021-11-08 18:44 psycopg_binary/_psycopg.cpython-38-x86_64-linux-gnu.so
316456 2021-11-08 18:44 psycopg_binary/pq.cpython-38-x86_64-linux-gnu.so
Running auditwheel repair --strip didn't work for us because it broke some of the system libraries, which then fail import with message ELF load command address/offset not properly aligned. However the system libraries seem already stripped and there wasn't relevant decrease in size (some of them actually increased...) So we are experimenting with a pre-repair script to strip only our .so.