maturin icon indicating copy to clipboard operation
maturin copied to clipboard

Ship both python library AND binary command line tool in wheel

Open PeterQLee opened this issue 4 years ago • 19 comments

I'm in a situation where I would like to build both a python library and a command line tool in a wheel for distribution. (e.g., https://python-packaging.readthedocs.io/en/latest/command-line-scripts.html , but all the code would be in rust.)

The Cargo.toml would contain:

...
[lib]
crate-type = ["staticlib", "cdylib"]
name = "my_library"
path = "src/lib.rs"

[[bin]]
name = "my_commandline"
path= "src/cli.rs"
...

Is it possible to do this in maturin?

I've tried doing maturin build -b "bin", which only builds the command line tool (my_commandline) and not the library. maturin build -b "pyo3" likewise only builds the library (my_library) and not the command line tool.

Is it currently possible to build with both "bin" and "pyo3" bindings in a single wheel?

PeterQLee avatar Oct 25 '20 21:10 PeterQLee

For shipping both library and binary, you'd need to build twice and ship both binaries, having two copies of each dependency in the wheel. Would it be possible to work around issue by adding a pseudo-main to the rust library and creating a small python script that only calls the main of library?

konstin avatar Oct 28 '20 20:10 konstin

Does anyhow know how to control this from the pyproject.toml, so that pip wheel or pip install builds either the pyo3 or the bin type of wheel, since we can't get them both into the same wheel?

rmcgibbo avatar Jul 20 '21 03:07 rmcgibbo

After #948 we now have the skeleton to allow shipping both lib and bin, the remaining question is how to let users configure it. We'd still want maturin to by default only build lib or bin, build for both of them when asked to.

I'm thinking about adding the following options to pyproject.toml (and Cargo.toml):

[[tool.maturin.targets]]
name = "example"
kind = "bin"

[[tool.maturin.targets]]
name = "example"
kind = "lib"

Then you will get a libexample and example binary in the wheel.

I don't quite like the targets metadata name, easily gets confused with cargo's --target, but it's the name used by cargo metadata.

messense avatar Aug 17 '22 02:08 messense

Sounds good to me. Agreed targets is a little confusing, though it really is the name used by Cargo for these things: https://doc.rust-lang.org/cargo/reference/cargo-targets.html#cargo-targets

davidhewitt avatar Aug 18 '22 07:08 davidhewitt

I like it and while targets might be a little confusing at first blush, it seems like a community standard and not an inordinate cognitive barrier to understand and easy to document.

dylanbstorey avatar Aug 19 '22 02:08 dylanbstorey

Thanks for the feedback!

While thinking more about this, there is a question on how does the existing --bindings option interacts with targets, there might be cases where the lib uses pyo3 but the bin is just a simple binary that doesn't link to libpython.

The bindings detection might need to be changed to be per-target, and allow users to override it in targets configuration:

[[tool.maturin.targets]]
name = "example"
kind = "bin"
bindings = "pyo3"  # example links to libpython

[[tool.maturin.targets]]
name = "example"
kind = "lib"
bindings = "cffi" # or pyo3 and others

messense avatar Aug 19 '22 02:08 messense

We've been thinking about this as well - while this issue keeps moving forward, has anyone found a workaround to package both "bin" and "pyo3" bindings in a single wheel in the meantime?

ihales avatar Sep 27 '22 17:09 ihales

We've been thinking about this as well - while this issue keeps moving forward, has anyone found a workaround to package both "bin" and "pyo3" bindings in a single wheel in the meantime?

Here is my workaround, but I can't guarantee the correctness.

The idea is to build for two bindings and merge the built wheel.

#!/bin/bash

set -e

# Implementation:

TMPDIR=.merge-tmp

rm -rf "$TMPDIR"
mkdir -p "$TMPDIR/tmp1"
mkdir -p "$TMPDIR/tmp2"

# Build the wheel
# Note that for my specific use case, "python" feature is needed. You might want to change it.
maturin build -F python --release --bindings pyo3 -o "$TMPDIR/tmp1" $@
maturin build -F python --release --bindings bin  -o "$TMPDIR/tmp2" $@

# Grab Info
file_name=$(basename $(/bin/ls "$TMPDIR/tmp1"/*.whl))
dist_info=$(unzip -qql "$TMPDIR/tmp1/*.whl" | grep "\.dist-info/METADATA" | awk '{print $4}' | cut -d/ -f1)
name_version=$(basename -s '.dist-info' $dist_info)

# Merge wheel
mkdir -p "$TMPDIR/merged"
unzip -qo "$TMPDIR/tmp1/$file_name" -d "$TMPDIR/merged"
unzip -qo "$TMPDIR/tmp2/$file_name" -d "$TMPDIR/merged"

# Merge record
unzip -qjo "$TMPDIR/tmp1/$file_name" "*.dist-info/RECORD" -d "$TMPDIR/tmp1"
unzip -qjo "$TMPDIR/tmp2/$file_name" "*.dist-info/RECORD" -d "$TMPDIR/tmp2"
cat "$TMPDIR/tmp1/RECORD" "$TMPDIR/tmp2/RECORD" | sort | uniq > "$TMPDIR/merged/$name_version.dist-info/RECORD"

# Create the wheel

cd "$TMPDIR/merged"
zip -qr "../../$file_name" *
cd ../..
rm -rf "$TMPDIR"

kxxt avatar Mar 02 '23 03:03 kxxt

Here to say +1 for this feature. Love the project great work.

nanthony007 avatar Apr 11 '23 05:04 nanthony007

For those commenting here, could you also comment on why using an entrypoint that calls a pyo3/cffi function in the shared library doesn't work for you?

konstin avatar Apr 11 '23 12:04 konstin

Just confirming that this means adding an entry point to [project.scripts] in pyproject.toml from which you call a pseudo-main function that you initialized in rust lib (i.e. run_cli).

I have done this and it does function as expected. I think that this solution was simple and could avoid the above mentioned overhead by increasing documentation that this option/configuration exists.

Thank you.

nanthony007 avatar Apr 11 '23 16:04 nanthony007

I don't think its a matter of "doesn't work" as much as perceived in efficiencies in loading a python interpreter to launch a rust program.

Might be worth a performance comparison to demonstrate one way or the other if its an actual problem, if it goes beyond "nice to have" I can maybe attempt something in the future.

dylanbstorey avatar Apr 12 '23 00:04 dylanbstorey

For those commenting here, could you also comment on why using an entrypoint that calls a pyo3/cffi function in the shared library doesn't work for you?

My use case for this is: I have a binary in Rust that does supervision/instrumentation of a Python program, and it comes with a small Python extension module to help the Python program integrate properly. Launching an entire Python interpreter in order to supervise my other Python interpreter could work, but it's pretty wasteful (memory + startup speed), and more importantly it adds fragility (the whole reason I'm running my Rust binary is because I don't 100% trust my Python interpreter, and I want Rust to keep an eye on it! also I might eg want to run the Rust binary ad hoc from outside a container to debug a Python that's inside the container, so I don't have easy access to the Python environment...).

njsmith avatar Apr 16 '23 20:04 njsmith

For those commenting here, could you also comment on why using an entrypoint that calls a pyo3/cffi function in the shared library doesn't work for you?

FWIW, https://github.com/deshaw/nbstripout-fast would be the same sort of idea as @njsmith. We need to ship a rust binary because launching python is too slow (that's a huge fraction of why we wrote it in rust).

Our setup today is:

  • Unit tests in python (because the notebook API is in python). This is not shipped.
  • Ship only a rust binary packaged with python (so you can pip install this)

Ideally we'd add:

  • Call this from other python programs

mlucool avatar Apr 19 '23 20:04 mlucool

@njsmith @mlucool Thank you, those are very helpful replies and make a great case for including this feature

For further design, do you think it would or would not make sense if maturin would automatically produce two different wheel, one for binary and on for the library, that could potentially depend on each other, either unconditionally or through an extra? That way the user could e.g. only install the binary or only the python module without having to download and install twice the size, and the binary wheels wouldn't depend on the python interpreter (if on the other hand you anyway always need them together or having a single wheel is a requirement, then only putting both in the same wheel would make sense)

konstin avatar Apr 19 '23 22:04 konstin

Might be worth a performance comparison to demonstrate one way or the other if its an actual problem, if it goes beyond "nice to have" I can maybe attempt something in the future.

i'm always happy about real world benchmark numbers!

konstin avatar Apr 19 '23 22:04 konstin

I suspect flexibility is always what people want :). In my case, my rust program is small and no one would notice double the size so that would be the default either way for my use case.

mlucool avatar Apr 19 '23 22:04 mlucool