maturin
maturin copied to clipboard
Ship both python library AND binary command line tool in wheel
I'm in a situation where I would like to build both a python library and a command line tool in a wheel for distribution. (e.g., https://python-packaging.readthedocs.io/en/latest/command-line-scripts.html , but all the code would be in rust.)
The Cargo.toml would contain:
...
[lib]
crate-type = ["staticlib", "cdylib"]
name = "my_library"
path = "src/lib.rs"
[[bin]]
name = "my_commandline"
path= "src/cli.rs"
...
Is it possible to do this in maturin?
I've tried doing maturin build -b "bin"
, which only builds the command line tool (my_commandline) and not the library.
maturin build -b "pyo3"
likewise only builds the library (my_library) and not the command line tool.
Is it currently possible to build with both "bin" and "pyo3" bindings in a single wheel?
For shipping both library and binary, you'd need to build twice and ship both binaries, having two copies of each dependency in the wheel. Would it be possible to work around issue by adding a pseudo-main to the rust library and creating a small python script that only calls the main of library?
Does anyhow know how to control this from the pyproject.toml, so that pip wheel
or pip install
builds either the pyo3 or the bin type of wheel, since we can't get them both into the same wheel?
After #948 we now have the skeleton to allow shipping both lib and bin, the remaining question is how to let users configure it. We'd still want maturin to by default only build lib
or bin
, build for both of them when asked to.
I'm thinking about adding the following options to pyproject.toml
(and Cargo.toml
):
[[tool.maturin.targets]]
name = "example"
kind = "bin"
[[tool.maturin.targets]]
name = "example"
kind = "lib"
Then you will get a libexample
and example
binary in the wheel.
I don't quite like the targets
metadata name, easily gets confused with cargo's --target
, but it's the name used by cargo metadata
.
Sounds good to me. Agreed targets
is a little confusing, though it really is the name used by Cargo for these things: https://doc.rust-lang.org/cargo/reference/cargo-targets.html#cargo-targets
I like it and while targets
might be a little confusing at first blush, it seems like a community standard and not an inordinate cognitive barrier to understand and easy to document.
Thanks for the feedback!
While thinking more about this, there is a question on how does the existing --bindings
option interacts with targets
, there might be cases where the lib
uses pyo3 but the bin
is just a simple binary that doesn't link to libpython
.
The bindings detection might need to be changed to be per-target, and allow users to override it in targets
configuration:
[[tool.maturin.targets]]
name = "example"
kind = "bin"
bindings = "pyo3" # example links to libpython
[[tool.maturin.targets]]
name = "example"
kind = "lib"
bindings = "cffi" # or pyo3 and others
We've been thinking about this as well - while this issue keeps moving forward, has anyone found a workaround to package both "bin" and "pyo3" bindings in a single wheel in the meantime?
We've been thinking about this as well - while this issue keeps moving forward, has anyone found a workaround to package both "bin" and "pyo3" bindings in a single wheel in the meantime?
Here is my workaround, but I can't guarantee the correctness.
The idea is to build for two bindings and merge the built wheel.
#!/bin/bash
set -e
# Implementation:
TMPDIR=.merge-tmp
rm -rf "$TMPDIR"
mkdir -p "$TMPDIR/tmp1"
mkdir -p "$TMPDIR/tmp2"
# Build the wheel
# Note that for my specific use case, "python" feature is needed. You might want to change it.
maturin build -F python --release --bindings pyo3 -o "$TMPDIR/tmp1" $@
maturin build -F python --release --bindings bin -o "$TMPDIR/tmp2" $@
# Grab Info
file_name=$(basename $(/bin/ls "$TMPDIR/tmp1"/*.whl))
dist_info=$(unzip -qql "$TMPDIR/tmp1/*.whl" | grep "\.dist-info/METADATA" | awk '{print $4}' | cut -d/ -f1)
name_version=$(basename -s '.dist-info' $dist_info)
# Merge wheel
mkdir -p "$TMPDIR/merged"
unzip -qo "$TMPDIR/tmp1/$file_name" -d "$TMPDIR/merged"
unzip -qo "$TMPDIR/tmp2/$file_name" -d "$TMPDIR/merged"
# Merge record
unzip -qjo "$TMPDIR/tmp1/$file_name" "*.dist-info/RECORD" -d "$TMPDIR/tmp1"
unzip -qjo "$TMPDIR/tmp2/$file_name" "*.dist-info/RECORD" -d "$TMPDIR/tmp2"
cat "$TMPDIR/tmp1/RECORD" "$TMPDIR/tmp2/RECORD" | sort | uniq > "$TMPDIR/merged/$name_version.dist-info/RECORD"
# Create the wheel
cd "$TMPDIR/merged"
zip -qr "../../$file_name" *
cd ../..
rm -rf "$TMPDIR"
Here to say +1 for this feature. Love the project great work.
For those commenting here, could you also comment on why using an entrypoint that calls a pyo3/cffi function in the shared library doesn't work for you?
Just confirming that this means adding an entry point to [project.scripts] in pyproject.toml from which you call a pseudo-main function that you initialized in rust lib (i.e. run_cli).
I have done this and it does function as expected. I think that this solution was simple and could avoid the above mentioned overhead by increasing documentation that this option/configuration exists.
Thank you.
I don't think its a matter of "doesn't work" as much as perceived in efficiencies in loading a python interpreter to launch a rust program.
Might be worth a performance comparison to demonstrate one way or the other if its an actual problem, if it goes beyond "nice to have" I can maybe attempt something in the future.
For those commenting here, could you also comment on why using an entrypoint that calls a pyo3/cffi function in the shared library doesn't work for you?
My use case for this is: I have a binary in Rust that does supervision/instrumentation of a Python program, and it comes with a small Python extension module to help the Python program integrate properly. Launching an entire Python interpreter in order to supervise my other Python interpreter could work, but it's pretty wasteful (memory + startup speed), and more importantly it adds fragility (the whole reason I'm running my Rust binary is because I don't 100% trust my Python interpreter, and I want Rust to keep an eye on it! also I might eg want to run the Rust binary ad hoc from outside a container to debug a Python that's inside the container, so I don't have easy access to the Python environment...).
For those commenting here, could you also comment on why using an entrypoint that calls a pyo3/cffi function in the shared library doesn't work for you?
FWIW, https://github.com/deshaw/nbstripout-fast would be the same sort of idea as @njsmith. We need to ship a rust binary because launching python is too slow (that's a huge fraction of why we wrote it in rust).
Our setup today is:
- Unit tests in python (because the notebook API is in python). This is not shipped.
- Ship only a rust binary packaged with python (so you can pip install this)
Ideally we'd add:
- Call this from other python programs
@njsmith @mlucool Thank you, those are very helpful replies and make a great case for including this feature
For further design, do you think it would or would not make sense if maturin would automatically produce two different wheel, one for binary and on for the library, that could potentially depend on each other, either unconditionally or through an extra? That way the user could e.g. only install the binary or only the python module without having to download and install twice the size, and the binary wheels wouldn't depend on the python interpreter (if on the other hand you anyway always need them together or having a single wheel is a requirement, then only putting both in the same wheel would make sense)
Might be worth a performance comparison to demonstrate one way or the other if its an actual problem, if it goes beyond "nice to have" I can maybe attempt something in the future.
i'm always happy about real world benchmark numbers!
I suspect flexibility is always what people want :). In my case, my rust program is small and no one would notice double the size so that would be the default either way for my use case.