pyo3 icon indicating copy to clipboard operation
pyo3 copied to clipboard

add `bytes` conversion

Open kykosic opened this issue 8 months ago • 9 comments

Add conversions to bytes::Bytes crate.

This can also help some use cases related to #2888 when trying to deal with PyBytes.

I would appreciate feedback with regard to BytesMut. Currently it gets converted into PyBytes, and can be extracted from PyBytes and PyByteArray. This could cause confusion for users who expect a function taking ByteMut to offer a shared mutable buffer between Python and Rust, which cannot exist without a GIL reference. It might also make sense to only offer Bytes as you can always do BytesMut::from(Bytes).

kykosic avatar May 04 '25 19:05 kykosic

I'm unsure whether this conversion is a good idea. This implementation makes a full copy of the data every time (both ways) which I think people might not expect. So I'd err on the side of having such conversions be explicit, not implicit.

mejrs avatar May 04 '25 19:05 mejrs

See also https://github.com/PyO3/pyo3/issues/1992

mejrs avatar May 04 '25 19:05 mejrs

We could probably avoid copying from Python to Rust by using Bytes::from_owner (maybe with PyBackedBytes as the owner) but going from Rust to Python would almost certainly require a copy.

I'd love it for Python bytes objects to support a similar API where they allow construction from a slice and a destructor, then we could avoid copies both directions. But we're not there yet.

davidhewitt avatar May 04 '25 20:05 davidhewitt

... so while I think this is probably a useful thing to have, it might be that it's a future useful thing to have?

davidhewitt avatar May 04 '25 20:05 davidhewitt

I think this can still be useful if python -> rust is non-copy but rust -> bytes is copy, as IntoPyObject for Vec<u8> is already doing a copy.

bschoenmaeckers avatar May 05 '25 09:05 bschoenmaeckers

In case it’s not on your radars here, the pyo3-bytes crate does this zero-copy (used by the pyo3-object_store cloud IO crate, and from the same repo).

The approach @davidhewitt suggests above is precisely the way it took:

This provides [PyBytes], a wrapper around [Bytes][::bytes::Bytes] that supports the Python buffer protocol.

This uses the new Bytes::from_owner API introduced in bytes 1.9.

In that lib a PyBytesWrapper is passed into Bytes::from_owner() (thus the owner of the underlying memory), see the FromPyObject impl:

impl<'py> FromPyObject<'py> for PyBytes {
    fn extract_bound(ob: &Bound<'py, PyAny>) -> PyResult<Self> {
        let buffer = ob.extract::<PyBytesWrapper>()?;
        let bytes = Bytes::from_owner(buffer);
        Ok(Self(bytes))
    }
}

The buffer: PyBytesWrapper itself wraps a PyBuffer<u8>.

/// A wrapper around a PyBuffer that applies a custom destructor that checks if the Python
/// interpreter is still initialized before freeing the buffer memory.
///
/// This also implements AsRef<[u8]> because that is required for Bytes::from_owner
#[derive(Debug)]
struct PyBytesWrapper(Option<PyBuffer<u8>>);

lmmx avatar May 05 '25 10:05 lmmx

... so while I think this is probably a useful thing to have, it might be that it's a future useful thing to have?

If I were to make it zero copy Python to Rust, would it still be worth merging and possibly improve to full zero-copy in the future?

kykosic avatar May 05 '25 16:05 kykosic

In case it’s not on your radars here, the pyo3-bytes crate does this zero-copy (used by the pyo3-object_store cloud IO crate, and from the same repo).

Should pyo3-bytes be merged into pyo3? If the authors are willing on that.

MarkusSintonen avatar Jun 12 '25 16:06 MarkusSintonen

I think pyo3_bytes::PyBytes gets passed to Python as a real Python type; we haven't got a great story (yet) for doing those kind of centralised reusable types in the PyO3 framework itself (currently we only have PanicException and that one is painful enough).

Perhaps in the future it'd be reasonable to have such a type, but in the meanwhile I think the path we're currently heading with this PR is more suitable.

davidhewitt avatar Jun 14 '25 08:06 davidhewitt

I had another conversation today with someone who would like to see this feature across the line. Thanks again for working on this; I added my comments on top of those already outstanding :)

Thanks for the update! Sorry I haven't had time to touch this recently, but I'll get the recommendations in over the next few days

kykosic avatar Jul 11 '25 15:07 kykosic

Apologies as I haven't found time to work on this the past week. If someone else had time to pick this up feel free, otherwise I will try my best to squeeze in a couple hours asap

kykosic avatar Jul 21 '25 15:07 kykosic

I can pick this up tonight if you don’t mind :)

bschoenmaeckers avatar Jul 21 '25 16:07 bschoenmaeckers

Closing in favour of #5252

davidhewitt avatar Jul 25 '25 12:07 davidhewitt

Thanks very much all!

davidhewitt avatar Jul 25 '25 12:07 davidhewitt