manylinux icon indicating copy to clipboard operation
manylinux copied to clipboard

Feedback on manylinux related PEP

Open chrisantonellis opened this issue 4 years ago • 1 comments

Hello! I've been working on a PEP that I want to bring to the attention of manylinux contributors for feedback and critique.

Titled Build Dependency Specification for Manylinux Wheels, it proposes a data spec for capturing the steps required to setup a container for compiling a manylinux wheel. As some packages require modifications to the environment to build correctly (system dependencies installed, python packages installed etc) there would be great benefit to standardizing this data as it could be read by builders / installers, and contributed to by the open source community.

This PEP was contributed to the python-ideas mailing list as an idea here

  • https://mail.python.org/archives/list/[email protected]/thread/WPGI4LYOK4XHDSWFFFRIBBHNVZOKBGFT/

and was contributed as a draft PEP here

  • https://mail.python.org/archives/list/[email protected]/thread/GJX77GBYLTNOICXFHHHN5JOBKZM43O25/

Any and all feedback appreciated! Thank You

Abstract

This document specifies how Python software packages should specify what actions are required to modify a standard manylinux environment to correctly build and bundle the package into a manylinux wheel.

Motivation

Python wheels with compiled extensions may link to system libraries, requiring that the system libraries be available on the host systems (the system the wheel is installed on) to operate correctly. The manylinux project solves this problem through the auditwheel package, which identifies system libraries at compile time and bundles required libraries with the python wheel. This allows the python library, its compiled extensions, and any required system libraries to be installed on a host system without having to install the system library directly.

In an ideal world, all package authors would make use of the manylinux project and all python packages that require system libraries would provide compiled, bundled distributions on PyPI. However, this is not the case, and many packages do not.
There are valid instances where an author may not provide a manylinux wheel by choice: for example when a required system library cannot be bundled due to licensing. However, there are packages on PyPI that do not provide a wheel when one could be provided. This means that these libraries require local compilation prior to use, resulting in multiple negative side effects for the end user:

  • Required system libraries are not easily determined. They must be gleaned from project-specific documentation with no standardized format.
  • Compiling extensions can take a long time, adding additional expense to rebuilding environments.
  • Compiling a wheel that requires system libraries is non-trivial; it is easy to mismatch system library and python library version and be presented with cryptic error messages.

Some authors do provide manylinux wheels on PyPI by making use of the manylinux project. However, the manylinux project does not provide a standardized way to capture environment setup data.
This results in package authors keeping this data in project documentation or sometimes not recording it at all.

Rationale

This PEP proposes a common format for the data required to correctly setup manylinux environments to compile a wheel with required system libraries. This concept is borrowed from package managers such as RPM which make use of a .SPEC file to capture this data. This data can be used in a manylinux container to set up the environment prior to compiling, resulting in a valid manylinux wheel. This data can be standardized to allow for automated building of manylinux wheels.
Standardization of this data will allow package consumers to more easily contribute to building manylinux wheels when an existing distribution is lacking or not available.

Specification

The data will be located in the pyproject.toml file of a python project, in a main table titled manylinux_build_specification. The data will be grouped in sub tables titled per the manylinux version they are targeting, ex manylinux2014.

pyproject.toml

[manylinux_build_specification.manylinux2014]
extra_base_system_repositories = ["http://foo.com/packages/"]
system_dependencies = ["foo-1.0.0", "bar-1.0.0"]
python_dependencies = ["foo==1.0.0", "bar==1.0.0"]
environment_variables = ["FOO=BAR"]
steps = [
  "./scripts/build_and_upload.sh --my_option"
]

All actions will be performed within a manylinux image. Given that the manylinux project uses CentOS as the base linux flavor, we can assume the following:

  • Use of yum for system package management
  • Python versions available in /opt/python/

extra_base_system_repositories

Repositories to add to yum prior to installing system dependencies Additional repositories from which to download system dependencies. This allows access to builds of system libraries with the most up to date patches etc.

system_dependencies

System dependencies to install with yum prior to building. Entries are expected to be in yum name-version format.

environment_variables

Environment variables to set prior to building.

python_dependencies

Python libraries to install with pip prior to building. Will be installed for each version of python available in /opt/python/.

steps

Steps to be executed sequentially with bash. The entire build process can be captured here, or this can be a call to a separate script.

How to Teach This

This will be taught through examples and documentation provided in a reference implementation.

Reference Implementation

A reference implementation is currently in development. This will include the following:

  1. The data spec will be defined in an example python package.
  2. A python package will be created that consumes the data spec and sets up a manylinux container appropriately.
  3. A manylinux docker image will be created that runs the python package that consumes the data spec prior to building a wheel.

References

  • PEP 508 -- Dependency specification for Python Software Packages <https://www.python.org/dev/peps/pep-0508/>_
  • PEP 518 -- Specifying Minimum Build System Requirements for Python Projects <https://www.python.org/dev/peps/pep-0518/>_
  • PEP 571 -- The manylinux2010 Platform Tag <https://www.python.org/dev/peps/pep-0571/>_
  • PEP 599 -- The manylinux2014 Platform Tag <https://www.python.org/dev/peps/pep-0599/>_
  • PEP 600 -- Future ‘manylinux’ Platform Tags for Portable Linux Built Distributions <https://www.python.org/dev/peps/pep-0600/>_
  • PEP 631 -- Dependency specification in pyproject.toml based on PEP 508 <https://www.python.org/dev/peps/pep-0631/>_
  • RPM Packaging Guide: What is a SPEC File? <https://rpm-packaging-guide.github.io/#what-is-a-spec-file>_

Copyright

This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.

chrisantonellis avatar Jan 11 '21 14:01 chrisantonellis

Is anyone interested on commenting on the work above? I'm still looking for someone to act as a Sponsor for this PEP. Thank You!

chrisantonellis avatar Jan 15 '21 13:01 chrisantonellis