manylinux icon indicating copy to clipboard operation
manylinux copied to clipboard

Default link options causing seg-faults

Open kif opened this issue 7 years ago • 6 comments

I found a bug in pyFAI when compiled on manylinux1. All library built on "manylinux1" was segfaulting on all linux distribution but "manylinux1". The reference to the bug is: https://github.com/silx-kit/pyFAI/issues/649

The core of the bug is that on manylinux1, the linker (gcc) does not try to resolve internally the names:

  • One solution is to use manual name mangling, as implemented.
  • another solution would be to add an extra link argument: -Wl,-Bsymbolic-functions, but I am not sure gcc4.1 from manylinux1 supports it
  • The only portable options found seams to mangle the attribute of the function (attribute((visibility("hidden")))) but this is far from being intuitive.

Any opinion on this ?

kif avatar Aug 31 '17 12:08 kif

There's some discussion here: https://groups.google.com/forum/#!topic/cython-users/lgNG9ws-9-4

Unfortunately it's true, C symbol resolution is very counterintuitive...

I'm hesitant to suggest -Bsymbolic everywhere because it's a hack that breaks the normal rules for how Linux symbol resolution work (in particular it breaks LD_PRELOAD), and it only stops the code being built from getting symbols from elsewhere; it doesn't stop the code from breaking other libraries by interposing its symbols onto them. Hidden visibility is a cleaner solution. OTOH, sometimes the best solution is a hack.

Arguably the right solution is to force -fvisibility=hidden when building Python extensions, but unfortunately we can't just do that globally in the manylinux compilers, because they're also used to build shared libraries that are used by Python extensions and it would break them. I think setuptools could do this though b/c it knows whether it's building a Python extension or not? Of course this wouldn't 100% solve the problem because you could potentially have a shared library that your extension uses and that conflicts with one of the symbols that Python exports. But that's unlikely because common shared libraries have already worked out naming schemes to avoid popular libraries like libz.

Or possibly Python should stop linking directly to libz and libexpat, why do they even do that.

njsmith avatar Aug 31 '17 18:08 njsmith

Thanks Nathaniel for your input. I have learned a lot !

On Thu, 31 Aug 2017 18:17:00 +0000 (UTC) "Nathaniel J. Smith" [email protected] wrote:

There's some discussion here: https://groups.google.com/forum/#!topic/cython-users/lgNG9ws-9-4

Unfortunately it's true, C symbol resolution is very counterintuitive...

I'm hesitant to suggest -Bsymbolic everywhere because it's a hack that breaks the normal rules for how Linux symbol resolution work (in particular it breaks LD_PRELOAD), and it only stops the code being built from getting symbols from elsewhere; it doesn't stop the code from breaking other libraries by interposing its symbols onto them. Hidden visibility is a cleaner solution. OTOH, sometimes the best solution is a hack.

We were lucky to spot that bug on a small, self contained example. On larger project we are working on, we would not have investigated that deep. Numpy is explicitly hiding every c_function not be exposed on the python side, but one may forget to flag it as not to be exposed.

Arguably the right solution is to force -fvisibility=hidden when building Python extensions, but unfortunately we can't just do that globally in the manylinux compilers, because they're also used to build shared libraries that are used by Python extensions and it would break them. I think setuptools could do this though b/c it knows whether it's building a Python extension or not? Of course this wouldn't 100% solve the problem because you could potentially have a shared library that your extension uses and that conflicts with one of the symbols that Python exports. But that's unlikely because common shared libraries have already worked out naming schemes to avoid popular libraries like libz.

I am considering a solution like this which exposes only the python side of the library:

if self.compiler.compiler_type == 'unix':
        if sys.version_info[0] <= 2:
            ext.extra_compile_args.append('''-fvisibility=hidden -D'PyMODINIT_FUNC=__attribute__((visibility("default"))) void ' ''')
        else:  # Python3
            ext.extra_compile_args.append('''-fvisibility=hidden -D'PyMODINIT_FUNC=__attribute__((visibility("default"))) PyObject* ' ''')

apparently it works ... but their may be other side effects.

Or possibly Python should stop linking directly to libz and libexpat, why do they even do that.

Libz has little to do in this story. It could have been any third party library.

Cheers,

Jérôme Kieffer tel +33 476 882 445

kif avatar Sep 01 '17 07:09 kif

-D'PyMODINIT_FUNC=__attribute__((visibility("default"))) void ' ''')

Are you sure this is necessary? I think PyMODINIT_FUNC should already set symbol visibility correctly by default. (It uses __declspec(dllexport), but gcc recognizes that as an alias for __attribute__((visibility("default"))).)

Libz has little to do in this story. It could have been any third party library.

Not true! Only libraries that the Python interpreter is explicitly linked to (i.e., the ones that show up if you do ldd python) get forced onto extension modules. Otherwise, extension modules and the libraries they use are loaded into separate namespaces so they can't interfere with each other. So on my system it's literally only libz and libexpat whose symbols can cause collisions.

(Well, life gets more complicated if you're embedding Python into another program -- then all that program's symbols can also cause problems. But let's worry about that another time...)

njsmith avatar Sep 01 '17 10:09 njsmith

On Fri, 01 Sep 2017 10:01:45 +0000 (UTC) "Nathaniel J. Smith" [email protected] wrote:

-D'PyMODINIT_FUNC=__attribute__((visibility("default"))) void ' ''')

Are you sure this is necessary? I think PyMODINIT_FUNC should already set symbol visibility correctly by default. (It uses __declspec(dllexport), but gcc recognizes that as an alias for __attribute__((visibility("default"))).)

We did something like this: https://github.com/silx-kit/pyFAI/pull/663/commits/a03f2bafe9ca963c4fbe7faadb7c38cb892e92c8#diff-2eeaed663bd0d25b7e608891384b7298 so that only the Python function get's exposed.

I checked that no more symbols get exposed (beside the one for Python), manylinx wheels work outside their environment, C-import between Cython modules apparently still works (using a cdef-class from another extension still works), It looks OK from our perspective.

Libz has little to do in this story. It could have been any third party library.

Not true! Only libraries that the Python interpreter is explicitly linked to (i.e., the ones that show up if you do ldd python) get forced onto extension modules. Otherwise, extension modules and the libraries they use are loaded into separate namespaces so they can't interfere with each other. So on my system it's literally only libz and libexpat whose symbols can cause collisions.

I did not know about this. It explains why it did not crashing more often.

(Well, life gets more complicated if you're embedding Python into another program -- then all that program's symbols can also cause problems. But let's worry about that another time...)

I do agree... but a colleague of mine is working on embedding Python in Octave :/

Thanks for your input.

kif avatar Sep 01 '17 19:09 kif

My question was: if you leave out the -DPyMODINIT_FUNC=... flag, and just use -fvisibility=hidden alone, then does that work?

njsmith avatar Sep 02 '17 01:09 njsmith

On Sat, 02 Sep 2017 01:02:22 +0000 (UTC) "Nathaniel J. Smith" [email protected] wrote:

My question was: if you leave out the -DPyMODINIT_FUNC=... flag, and just use -fvisibility=hidden alone, then does that work?

Nop, no symbols are exposed then (still gcc/unix) and of course no module could be loaded at the python level.

As the behaviour of Windows is the opposite (everything is hidden except what is explicitly exposed), a preprocessor macro PyMODINIT_FUNC already exists and is used to exposed under windows. It does nothing under unix by default. This patch just mimics the behaviour of windows on linux.

Cheers,

Jerome

kif avatar Sep 02 '17 06:09 kif