meson icon indicating copy to clipboard operation
meson copied to clipboard

extension_module in cross compilation gives wrong architecture name

Open Ricardicus opened this issue 5 years ago • 3 comments

Describe the bug I am cross-compiling a python extension module in a project I am working on into a shared library for an aarch64 platform. The name of the file becomes "pyworker.cpython-37m-x86_64-linux-gnu.so" which is not right. My build machine is a x86_64 but this is not the target platform. In fact if I call "file" too see what type of file "pyworker.cpython-37m-x86_64-linux-gnu.so" I get:

$ file pyworker.cpython-35m-x86_64-linux-gnu.so 
pyworker.cpython-35m-x86_64-linux-gnu.so: ELF 64-bit LSB shared object, ARM aarch64, version 1 (GNU/Linux), dynamically linked, BuildID[sha1]=1d6a9aed7a2e84d8b4a0ab154c96873a1732b1d7, with debug_info, not stripped

I seem incapable of changing this name into something more appropriate and I have an issue with this.

To Reproduce The code in the meson.build looks like this:

pymod = import('python')
py = pymod.find_installation('python3')

py.extension_module('pyworker',
                    sources : python_sources,
                    ...
                    install: true
                    )

I am using a meson.cross file with

[binaries]
....

[properties]
...

[host_machine]
system = 'linux'
cpu_family = 'aarch64'
cpu = 'aarch64'
endian = 'little'

[target_machine]
system = 'linux'
cpu_family = 'aarch64'
cpu = 'aarch64'
endian = 'little'

Expected behavior I expect not to see the architecture name x86_64 in the file name, rather aarch64.

system parameters

  • meson version: 0.54.0
  • OS: Linux Ubuntu 19.04 GNU/Linux x86_64
  • This is a cross build, outlined above.
  • Python version: 3.7.3
  • Ninja version: 1.9.0.git.kitware.dyndep-1.jobserver-1

Ricardicus avatar Apr 28 '20 09:04 Ricardicus

This is a bug in the module as a whole, it always uses the running python interpreter to get the path to install modules to, which isn't right.

dcbaker avatar Apr 30 '20 17:04 dcbaker

it always uses the running python interpreter

It doesn't look like that to me. :/ INTROSPECT_COMMAND is run by the python returned from find_installation, and architecture name etc. comes from the introspected EXT_SUFFIX and other sysconfig variables.

It seems to me that the problem here is pymod.find_installation('python3') is detecting the running python interpreter, because the python binary isn't being overridden in the cross file. The solution is to override the python binary in the cross file.

I do see a different problem, which is that it doesn't seem to be hooked up to exe_wrapper and the introspection command may just utterly fail, resulting in the python module reporting that it is "not a valid python" and returning not-found.

eli-schwartz avatar Nov 24 '21 04:11 eli-schwartz

Somewhat related to this, pymod.find_installation() should build native: true modules.

bonzini avatar Nov 28 '22 21:11 bonzini

It seems to me that the problem here is pymod.find_installation('python3') is detecting the running python interpreter, because the python binary isn't being overridden in the cross file. The solution is to override the python binary in the cross file.

I ran into this issue and am trying to fix this bug. But there's an issue here I think.

State right now: for a setup with a regular build and host envs with a Python installed in both, and without specifying python in the [binaries] section of the cross file, I do get a completed build. Things are probably subtly broken beyond the x86-64 in extension module names, because the introspection is looking at the build env's Python interpreter. But the build completes - so as of today things do work with crossenv (and we see that when building SciPy for conda-forge, all tests pass for x86-64 -> aarch64).

The problem with trying to fix this issue: say I have a cross file containing:

[constants]
arch = 'aarch64-conda-linux-gnu'

[binaries]
c = arch + '-gcc'
cpp = arch + '-g++'
fortran = arch + '-gfortran'
ar = arch + '-ar'
strip = arch + '-strip'
pkgconfig = 'pkg-config'

and now I want to specify the host Python binary:

python = prefix + 'bin/python3'

That will fail with:

ERROR: <PythonExternalProgram 'python' -> ['/home/rgommers/mambaforge/envs/host-env/bin/python3']> is not a valid python or it is missing distutils

Makes sense - that's an aarch64 Python binary and it won't run. However, when adding:

exe_wrapper = 'qemu-aarch64-static'

to make host Python run, the effect is that exe_wrapper applies to all specified binaries in the cross file, so we immediately get an error like:

../meson.build:1:0: ERROR: Executables created by cpp compiler aarch64-conda-linux-gnu-g++ are not runnable.

because the cross-compilers are also exe-wrapped. The docs don't talk about this; for CMake specifically there seems to be a hack (cmake_use_exe_wrapper) to work around this, but it seems to be like where or not an exe-wrapper is needed should be specified per binary. Am I missing something here?

rgommers avatar Mar 07 '23 09:03 rgommers

This is still quite confusing - I'd like to get some confirmation of what the intended design is here.

We have the following ingredients:

(1) a Python executable and dependency object from this canonical snippet in a meson.build file:

py = import('python').find_installation(pure: false)
py_dep = py.dependency()

(2) a native file which identifies a Python interpreter (used by meson-python)

[binaries]
python = '/path/to/build/env/python'

(3) a cross file which may identify a Python interpreter (possibly necessary for all cross builds, unclear to me):

[binaries]
python = '/path/to/host/env/python'

(4) we may have to run the native Python interpreter for, e.g., codegen tasks. This can be done in two ways:

# the .py file here must contain a shebang `!/usr/bin/env python3`
run_command('_codegen_script.py')

_decomp_update_pyx = custom_target('_a_target',
  output: '_outfile.ext',
  input: '_outfile.ext.in',
  command: [_codegen_script.py, '@INPUT@', '-o', '@OUTDIR@']

or:

# here `py` is the interpreter found by `py_mod.find_installation`
_decomp_update_pyx = custom_target('_a_target',
  output: '_outfile.ext',
  input: '_outfile.ext.in',
  command: [py, _codegen_script.py, '@INPUT@', '-o', '@OUTDIR@']
)

I believe @eli-schwartz suggested elsewhere that the shebang method must always be preferred. In SciPy I mostly use the custom_target(..., command: [py, ...]) method though, which has worked so far. But, if py is the host interpreter, that's obviously not ideal - and using !/usr/env/bin python3 is potentially problematic when you're running a non-default build interpreter and need specific packages like f2py installed in that interpreter for code generation.

(5) We also have a python3 dependency:

`dependency('python')`

with its docs stating: "Note that python3 found by this dependency might differ from the one used in python3 module because modules uses the current interpreter, but dependency tries pkg-config first."

It looks like @dcbaker and @bonzini are assuming that the Meson design is that py.find_interpreter returns the build (native) interpreter, and @eli-schwartz is assuming it's the host (cross) interpreter. Given the docs in https://mesonbuild.com/Python-module.html and the methods on the interpreter and dependency object, I'd say that @eli-schwartz is correct here. However, the text in italics above for the dependency hints at @dcbaker's interpretation. That is hard to understand though, because if I'm doing a cross build, why would I ever want to use things like py.extension_module and py.install_sources if py would be the build env's interpreter? And also, when we're not explicitly specifying the path to the host interpreter in the cross file, a cross build picks the build environment's interpreter by default. That seems very much inconsistent.

Thoughts?

rgommers avatar Mar 20 '23 11:03 rgommers

@dcbaker @eli-schwartz, could we get your thoughts here? This will influence the upcoming scipy release, in the sense that some distributors have already said they'll have to carry patches (which we cannot merge into SciPy proper before the terminology is clearer).

h-vetinari avatar May 20 '23 07:05 h-vetinari

I'd like to fix this by taking inspiration from Gentoo's gpep517, which has already solved the problem. In Gentoo, we try hard to avoid exe wrappers like QEMU, as they greatly complicated matters for the package manager.

Both Meson and gpep517 find Python's stdlib path by cheating slightly, combining the target host's prefix with the build host's Python stdlib path. This assumes that Python is configured in a similar way between the build and target hosts, but it's probably the best we can do.

gpep517 goes one step further and uses the target's sysconfigdata file from stdlib. The exact filename is hard to predict, but there should only be one, so it looks for _sysconfigdata_*.py. This is loaded using importlib (which Meson already uses) and various settings are then used to perform the build, including EXT_SUFFIX, which is used to determine the extension filename.

This is probably best done in python_info.py, but I'll see what works.

chewi avatar Aug 03 '23 12:08 chewi

You can do this without any explicit meson support. Set your python binary to a wrapper script that exports $_PYTHON_SYSCONFIGDATA_NAME before running your own build_machine python -- this will then get used by meson's introspection script. The same trick has been commonly done for setuptools-based cross compilation. You can also export it before running meson, but that may mess up cases where you find multiple installations in one meson.build.

The problem with this approach is that not everything is in sysconfig_data, although several crucial cross-compilation things are, in particular the architecture name. Things that aren't:

  • stable ABI extensions, uses importlib.machinery
  • links_against_libpython (can be fixed for newer pythons), uses distutils
  • the _INSTALL_SCHEMES dict, including some highly arbitrary vendor patches such as the deb_system scheme owned by Debian, uses sysconfig.py but not sysconfigdata
  • config vars like py_version_short, py_version_nodot as used by the install schemes dict and currently by our dependency factory as well, uses sysconfig.py but not sysconfigdata

It's not clear what to do for these bits of data aside for actually running the cross python itself.

eli-schwartz avatar Aug 25 '23 03:08 eli-schwartz

See #12190 for a fix.

chewi avatar Aug 31 '23 22:08 chewi