pyo3
pyo3 copied to clipboard
Implement `ToPyObject` for `bytes::Bytes`
The bytes
crate from Tokio is really useful for operating on bytes in a performant way: https://docs.rs/crate/bytes/1.1.0.
We use it in Pants. I'd love to convert a #[pyfunction]
we have to be able to stop acquiring the GIL inside of a future so that we can better use py.allow_threads(||)
. But lifetimes mean that we only have access to &[u8]
in the future, so we would need to convert into Vec<u8>
, which is more copying than we can tolerate, especially because iirc that Vec<u8>
would then get copied again into the PyBytes
?. https://github.com/pantsbuild/pants/blob/d62a3cd87c114e379748edbfa279f6fa11da6fd5/src/rust/engine/src/externs/interface.rs#L1399
Instead, I think I'm hoping to have that future convert the &[u8]
into Bytes
(which does zero copying). And then PyO3 will convert Bytes
into PyBytes
automatically outside of my #[pyfunction]
- allowing me to use py.allow_threads()
the whole time. Does that make sense?
I imagine this would be a new Cargo feature.
I'm very happy to contribute this!
convert the &[u8] into Bytes (which does zero copying)
Only if the slice has a static lifetime. Otherwise the slice has to be copied regardless.
This would be very welcome IMO! I've wondered about adding an optional bytes
dependency myself in the past but never got around to it. There's a bunch of optional deps like num-bigint
with conversions added in src/conversions
- you could add a new bytes.rs
file following those examples. 👍
Only if the slice has a static lifetime. Otherwise the slice has to be copied regardless.
Oh, hm, yeah you're right: https://docs.rs/bytes/1.1.0/bytes/struct.Bytes.html#method.copy_from_slice Good point. We can possibly change that internal API to return Bytes
though rather than &[u8]
, though, so this feature request would still be really helpful to Pants.
you could add a new bytes.rs file following those examples. 👍
Sweet! Thanks.
I don't think we need anything new for this. Converting Bytes
into PyBytes
works today (specifically, Bytes
autodereferences to &[u8]
):
use pyo3::prelude::*;
use pyo3::types::PyBytes;
use bytes::Bytes;
fn main() {
let bytes = Bytes::from_static(&[1,2,3,4]);
Python::with_gil(|py|{
let pb = PyBytes::new(py, &bytes);
dbg!(pb);
})
}
Looking at the capi and Bytes' implementation it doesn't look like we can do something clever here.
Bytes
may not work as a return value from #[pyfunction]
without an IntoPy<PyObject>
trait implementation? (And similarly as a #[pyfunction]
argument.)
That is true, and it would be more convenient. But it has some footgunny behavior - I can totally see someone starting with a Vec<u8>
, converting it to a Bytes
(clone one), which then gets converted (second clone) into a PyBytes. While it would in fact be more performant to just use PyBytes::new(py, &my_vec)
(cloning only once).
I don't think it would be a disaster or anything, but I think it would be a mistake to offer this api if we can't implement it in a clever way.
Perhaps this is best offered in a container-agnostic way, like impl IntoPy<Py<PyBytes>> for AsRef<[u8]>
(would that work? I haven't tried), which also requires an explicit conversion.
I believe Bytes::from(vec)
just takes ownership of the underlying buffer rather than having to copy the memory (but I could be wrong).
impl IntoPy<Py<PyBytes>> for AsRef<[u8]>
Implementations like this are possible, though I kinda don't like them because I think each Rust type should map to only one Python type. In the future I would like to remove the generic parameter from IntoPy
and instead have an associated type, but last time I looked at that it needs min_specialization
to be able to deal with some edge cases like ()
mapping to Python tuple except in return values where it should map to None
.
I believe Bytes::from(vec) just takes ownership of the underlying buffer rather than having to copy the memory (but I could be wrong).
It goes through a boxed slice, which can reallocate if there is excess capacity.
It goes through a boxed slice, which can reallocate if there is excess capacity.
Ah interesting, I wasn't aware of that. I guess that's one of those impossible API questions where different users want different things 🤷
EDIT: actually thinking about it, not reallocating and continuing to hold the extra capacity is a perf optimization that would confuse a lot of users seeing excess memory consumption, so std
probably makes the right choice.
I'm thinking the best way to do this is to create an object that wraps a Bytes
and implements the buffer protocol?
That would be very efficient!
Because of the complexity of #1444 I shy away from utility pyclasses in libraries at the moment. It's definitely something we need to figure out, however at the moment each extension module would get its own copy of such a type and they wouldn't compare equal etc.
(However we could create an example for users to adapt in their own code? Also in #1884 I hope to offer a safe way to implement buffer protocol using #[pymethods]
magic methods soon...)
An example sounds totally reasonable to me!
I definitely would not want adding this convenience to impede future foundational improvements to PyO3.