cibuildwheel icon indicating copy to clipboard operation
cibuildwheel copied to clipboard

Strip debug symbols of wheels

Open YannickJadoul opened this issue 5 years ago • 16 comments

I was just made aware by @mattip that some Python distributions have -g in sysconfig.get_config_vars('CFLAGS'), and thus include debug symbols (including the versions in the manylinux images, it seems). The reason for this apparently is that these are then stripped before packed in the Debian/Fedora/... package managers (and often the symbols themselves are added as separate package for debugging).

The thing about building wheels is that these sysconfig are used when building wheels. So should we somehow strip symbols as well, or add -Wl,-strip-all to the build flags, or ... ?

YannickJadoul avatar Apr 30 '20 21:04 YannickJadoul

See also numpy/numpy#16110

YannickJadoul avatar Apr 30 '20 21:04 YannickJadoul

xref MacPython/numpy-wheels#82. FWIW, multibuild uses -Wl,-strip-all by default

mattip avatar Apr 30 '20 21:04 mattip

It's not something I'd like cibuildwheel to do by default - for small C extensions, including debug symbols is very nice so that users can supply crash reports. But it might be nice to document it somewhere, for example in the Tips and Tricks section of the docs "Why are my wheels so big?"

joerick avatar May 01 '20 16:05 joerick

That could also work, ofc. If we then document that adding -Wl,-strip-all does the job, that would help.

On the other hand, wouldn't stripping them be a sensible default? By default Python won't print any C stack trace, and it would be reasonably hard for a typical Python users to get these (going through gdb would then be the way to go, no, or is there a simpler way?).

The problem not stripping by default is that (almost) no one reads the docs unless there's a problem (like missing debug symbols).

YannickJadoul avatar May 01 '20 16:05 YannickJadoul

The problem not stripping by default is that (almost) no one reads the docs unless there's a problem (like missing debug symbols).

That's true, but I'm not sure it's that big of a problem. We don't minify our Python code, even though it could save loads of space/bandwidth. I guess it's a philosophy thing for me - software should be open and hackable by default, especially in open source. But definitely document, as it could be really handy for some projects :)

joerick avatar May 02 '20 16:05 joerick

This comment suggests using --strip-debug as a compromise between information and size.

mattip avatar May 02 '20 18:05 mattip

For what it's worth, I've tried to compile with multiple settings, and run strip (with or without --strip-debug):

debug.so                       91M
debug_strip-debug.so           42M
debug_strip.so                 36M
minsizerel.so                  28M
minsizerel_strip.so            28M
release.so                     31M
release_strip.so               31M
relwithdebinfo.so              91M
relwithdebinfo_strip-debug.so  32M
relwithdebinfo_strip.so        29M

and if you zip them, as they will be in a wheel (ratios seems to stay approximately the same):

debug.zip                        28M
debug_strip-debug.zip            11M
debug_strip.zip                 9.2M
minsizerel.zip                  7.8M
minsizerel_strip.zip            7.8M
release.zip                     9.2M
release_strip.zip               9.2M
relwithdebinfo.zip               28M
relwithdebinfo_strip-debug.zip  9.0M
relwithdebinfo_strip.zip        8.5M

These are CMake build types, so:

  • Debug: -g
  • Release: -O3 -DNDEBUG
  • RelWithDebInfo: -g -O2 -DNDEBUG
  • MinSizeRel: -Os -DNDEBUG

I don't know how representative my project is (there's an enormous code base I'm wrapping that isn't mine but that's +- normal C/C++, but there's also lots of template instantiations coming from pybind11 that will result in long names, I suppose), but stripping symbols results in approximately a third of the size for builds with -g.

YannickJadoul avatar May 03 '20 15:05 YannickJadoul

That's true, but I'm not sure it's that big of a problem. We don't minify our Python code, even though it could save loads of space/bandwidth. I guess it's a philosophy thing for me - software should be open and hackable by default, especially in open source. But definitely document, as it could be really handy for some projects :)

While what you say makes sense, I slightly disagree in this particular context. I agree with the part that software should be open and hackable by default, but built binaries - well, need not be. Considering the fact that most people would use cibuildwheels as the final step for releasing software, I would say stripping out the debug symbols by default would be a wiser choice (at least in my opinion). Or alternatively, there should be an easy option to configure it.

chaitan94 avatar Jul 11 '20 17:07 chaitan94

I guess... I agree that there's some debate to be had here. cibuildwheel doesn't have a position on this though - Python (via sysconfig) is setting some defaults for CFLAGS to enable this behaviour. I'm not sure I'd want to add code into cibuildwheel to override that - it could get confusing to users where it's coming from.

there should be an easy option to configure it

Module authors have full control over how their extensions compile through setup.py. If you want to strip these symbols, I believe you can do:

setup(
  ext_modules=[
    Extension('_foo', ['foo.c'], extra_compile_args=['-g0'])
  ],
)

Refs: https://docs.python.org/3/distutils/setupscript.html#other-options https://clang.llvm.org/docs/UsersManual.html#cmdoption-g0

If somebody can confirm the above syntax that'd be great! Or if there's a better way, let me know. Then we can add some documentation showing how best to do this.

joerick avatar Jul 11 '20 19:07 joerick

I just tried out adding the -g0 on one of my projects, it worked out well. (My ~13MB wheels are now ~1.2MB, and when unzipped, that's a reduction from around 52MB to 2MB :sweat_smile:). So I agree the user still has the full control. Maybe just to continue to debate though - one other point to consider would be that since cibuildwheel aims to be easy to use alternative to the more customizable multibuild - (I think) a lot of people who are maybe not well versed with C/C++ and setup.py (like me) might want to use it - and expect cibuildwheel to handle and apply the best practices for building wheels for them. So if not automating, at least some documentation which might guide them towards it might help. Again - just extending the argument on this perspective - probably you can make the right call.

chaitan94 avatar Jul 12 '20 06:07 chaitan94

It's not something I'd like cibuildwheel to do by default - for small C extensions, including debug symbols is very nice so that users can supply crash reports.

It doesn't make any sense if extension is linked against the release edition of Python's run-time library, does it?

cher-nov avatar Nov 02 '20 11:11 cher-nov

It's not something I'd like cibuildwheel to do by default - for small C extensions, including debug symbols is very nice so that users can supply crash reports.

It doesn't make any sense if extension is linked against the release edition of Python's run-time library, does it?

Since python 3.8 there is the same ABI for debug and normal build: https://docs.python.org/3/whatsnew/3.8.html#debug-build-uses-the-same-abi-as-release-build

Czaki avatar Nov 02 '20 11:11 Czaki

@cher-nov You still get full names of the functions and methods of the extension module in the strack traces, though. So often, to locate a bug, that's more than enough, and you don't need a debug build of Python itself.

YannickJadoul avatar Nov 02 '20 11:11 YannickJadoul

If we had a "tutorial" page, this might somehow make it in, otherwise, it should just be an entry in FAQ? Pybind11's setup helpers add this by default. https://github.com/pybind/pybind11/blob/721834b422482a522abd4e83f11d545ef876f997/pybind11/setup_helpers.py#L146

henryiii avatar Feb 01 '21 04:02 henryiii

Yes, an entry in the FAQ would be good, @henryiii. I guess it would be CIBW_ENVIRONMENT: CFLAGS=-g0?

joerick avatar Feb 03 '21 18:02 joerick

Hello,

FYI I'm working on this issue in Psycopg 3: https://github.com/psycopg/psycopg/issues/142

I think cython libraries are big offenders, because the names generated are massive. Some stats our side:

Worst offender in psycopg 3.0.2 is the x86-64 Python 3.8 wheel package. Stripping our .so files the download size shrunk 33%:

$ ls -l */psycopg_binary-3.0.2-cp38-cp38-manylinux_2_24_x86_64.whl
-rw-rw-r-- 1 piro piro 6340205 Nov  8 14:32 tmp/psycopg_binary-3.0.2-cp38-cp38-manylinux_2_24_x86_64.whl
-rw-rw-r-- 1 piro piro 4275873 Nov  8 18:45 tmpstrip/psycopg_binary-3.0.2-cp38-cp38-manylinux_2_24_x86_64.whl

Unpacked footprint of all the libs installed shrunk 60%:

$ for f in */psycopg_binary-3.0.2-cp38-cp38-manylinux_2_24_x86_64.whl; do unzip -l $f; done | grep files
 28606803                     30 files
 11346257                     30 files

Footprint of the psycopg binaries alone shrunk 88%:

$ for f in */psycopg_binary-3.0.2-cp38-cp38-manylinux_2_24_x86_64.whl; do unzip -l $f; done | grep so$
 15737560  2021-11-08 14:32   psycopg_binary/_psycopg.cpython-38-x86_64-linux-gnu.so
  2978856  2021-11-08 14:32   psycopg_binary/pq.cpython-38-x86_64-linux-gnu.so
  1139416  2021-11-08 18:44   psycopg_binary/_psycopg.cpython-38-x86_64-linux-gnu.so
   316456  2021-11-08 18:44   psycopg_binary/pq.cpython-38-x86_64-linux-gnu.so

Running auditwheel repair --strip didn't work for us because it broke some of the system libraries, which then fail import with message ELF load command address/offset not properly aligned. However the system libraries seem already stripped and there wasn't relevant decrease in size (some of them actually increased...) So we are experimenting with a pre-repair script to strip only our .so.

dvarrazzo avatar Nov 08 '21 19:11 dvarrazzo