add `bytes` conversion
Add conversions to bytes::Bytes crate.
This can also help some use cases related to #2888 when trying to deal with PyBytes.
I would appreciate feedback with regard to BytesMut. Currently it gets converted into PyBytes, and can be extracted from PyBytes and PyByteArray. This could cause confusion for users who expect a function taking ByteMut to offer a shared mutable buffer between Python and Rust, which cannot exist without a GIL reference. It might also make sense to only offer Bytes as you can always do BytesMut::from(Bytes).
I'm unsure whether this conversion is a good idea. This implementation makes a full copy of the data every time (both ways) which I think people might not expect. So I'd err on the side of having such conversions be explicit, not implicit.
See also https://github.com/PyO3/pyo3/issues/1992
We could probably avoid copying from Python to Rust by using Bytes::from_owner (maybe with PyBackedBytes as the owner) but going from Rust to Python would almost certainly require a copy.
I'd love it for Python bytes objects to support a similar API where they allow construction from a slice and a destructor, then we could avoid copies both directions. But we're not there yet.
... so while I think this is probably a useful thing to have, it might be that it's a future useful thing to have?
I think this can still be useful if python -> rust is non-copy but rust -> bytes is copy, as IntoPyObject for Vec<u8> is already doing a copy.
In case it’s not on your radars here, the pyo3-bytes crate does this zero-copy (used by the pyo3-object_store cloud IO crate, and from the same repo).
The approach @davidhewitt suggests above is precisely the way it took:
This provides [PyBytes], a wrapper around [Bytes][::bytes::Bytes] that supports the Python buffer protocol.
This uses the new Bytes::from_owner API introduced in bytes 1.9.
In that lib a PyBytesWrapper is passed into Bytes::from_owner() (thus the owner of the underlying memory), see the FromPyObject impl:
impl<'py> FromPyObject<'py> for PyBytes {
fn extract_bound(ob: &Bound<'py, PyAny>) -> PyResult<Self> {
let buffer = ob.extract::<PyBytesWrapper>()?;
let bytes = Bytes::from_owner(buffer);
Ok(Self(bytes))
}
}
The buffer: PyBytesWrapper itself wraps a PyBuffer<u8>.
/// A wrapper around a PyBuffer that applies a custom destructor that checks if the Python
/// interpreter is still initialized before freeing the buffer memory.
///
/// This also implements AsRef<[u8]> because that is required for Bytes::from_owner
#[derive(Debug)]
struct PyBytesWrapper(Option<PyBuffer<u8>>);
... so while I think this is probably a useful thing to have, it might be that it's a future useful thing to have?
If I were to make it zero copy Python to Rust, would it still be worth merging and possibly improve to full zero-copy in the future?
In case it’s not on your radars here, the pyo3-bytes crate does this zero-copy (used by the pyo3-object_store cloud IO crate, and from the same repo).
Should pyo3-bytes be merged into pyo3? If the authors are willing on that.
I think pyo3_bytes::PyBytes gets passed to Python as a real Python type; we haven't got a great story (yet) for doing those kind of centralised reusable types in the PyO3 framework itself (currently we only have PanicException and that one is painful enough).
Perhaps in the future it'd be reasonable to have such a type, but in the meanwhile I think the path we're currently heading with this PR is more suitable.
I had another conversation today with someone who would like to see this feature across the line. Thanks again for working on this; I added my comments on top of those already outstanding :)
Thanks for the update! Sorry I haven't had time to touch this recently, but I'll get the recommendations in over the next few days
Apologies as I haven't found time to work on this the past week. If someone else had time to pick this up feel free, otherwise I will try my best to squeeze in a couple hours asap
I can pick this up tonight if you don’t mind :)
Closing in favour of #5252
Thanks very much all!