add second lifetime to `FromPyObject`
This adds the second lifetime to FromPyObject and removes FromPyObjectBound. I hope I adjusted all the bounds correctly. I believe the equivalent to our current implementation is use HRTB for the input lifetime on any generic bound. But this should only be necessary for containers that create temporary Python references during extraction. For wrapper types it should be possible to just forward the relaxed bound.
For easier trait bounds FromPyObjectOwned is introduced. This is blanket implemented for any FromPyObject type that does not depend on the input lifetime. It is intended to be used as a trait bound, the idea is inspired by serde (Deserialize <=> DeserializeOwned)
I tried to document different cases in the migration guide, but it can probably still be extended. ~Changelog entry is still missing, will write that tomorrow.~
Breaking changes:
- the second lifetime
- the
extractmethod without default - the addition lifetime bound on
extract_boundto make the default work
I propose that we add a FromPyObjectOwned trait similar to serde's DeserializeOwned. It's much easier to explain "just use this trait if the output doesn't borrow from Python" rather than "you need higher ranked trait bounds".
Thanks! I think that's a great idea. It's essentially the same thing, but it's much easier to grasp by giving it a distinct name, plus we can better document it in the API docs. I will adapt this to make use of that.
Thanks for the review! You are right, the choice of Borrowed over &Bound is definitely something to discuss about. I used it here because it was the final form of FromPyObjectBound that we landed on. Looking back, the change to Borrowed happened in #3959, which was related to the gil-refs API, so it might not be strictly necessary anymore (haven't checked). The points you mentioned seem like good justification why it still makes sense to introduce additional complexity here.
I think the upsides justify this, though we might want to increase the quality of documentation and examples on
Borrowed(which I'd left quite minimal so far).
I agree, if we put Borrowed in such a core part of the infrastructure it deserves a documentation overhaul. We might also want to take a look at the current (public) API and additionally expose some helpers. Something like to_any comes to mind, which is currently internal only.
Thanks for the feedback! I took another pass over it and tried to make it a bit more concise. I do however think that it's worth to talk about the individual lifetimes at play here, since this trait is right in the center of PyO3. I moved the paragraph about collections into the FromPyObjectOwned docs, because it works much better there with the code example, and replaced it with a smaller note for more info. I reworded my examples a bit and moved them under a # Details section, maybe this works better?
Let me know what you think about this.
I believe this now also finishes the _bound methods deprecations 🎉
Fantastic, thank you so much for your help on pushing that over the line. When I get some time I'll try to write up a list of what we might have left for 0.23 (hopefully not much).
Ah, I just started playing with this locally and found one additional finding: having Borrowed<'a, 'py, T> as the input type means that it's impossible to implement FromPyObject<'a, 'py> for &'a Bound<'py, T> - because the input borrowed can't be cast into a &'a Bound<'py, T> for a sufficient lifetime (pointer goes in, pointer-to-pointer goes out).
Is that a problem? I think not, because that's what .downcast() is for (and it's potentially nice to separate them like this), however just worth a brief double-check that others agree.
Oh, and one more case which comes up in local testing. At the moment from_py_with extractors take &Bound<'py, PyAny>. Should we be considering changing those to Borrowed too, for consistency?
Should we be considering changing those to Borrowed too, for consistency?
It feels to me like we might want to consider macro tricks such that users can use both &Bound and Borrowed.
having
Borrowed<'a, 'py, T>as the input type means that it's impossible to implementFromPyObject<'a, 'py> for &'a Bound<'py, T>
That's true, but as you mention below, I actually think this a good thing and makes a clearer use case between .extract() and .downcast().
Oh, and one more case which comes up in local testing. At the moment
from_py_withextractors take&Bound<'py, PyAny>. Should we be considering changing those toBorrowedtoo, for consistency?
Hmm, good question. I guess that is part of the question about how prominent we want Borrowed to be. Providing some flexibility via macro magic as you suggested sounds like a good compromise. I can explore that as a followup.
Yes agreed, it's definitely a follow-up to make changes to expand from_py_with.
I wonder, are there any other options we have here? E.g. does it work to have a trait with both extract and extract,_borroeed, maybe with a default implementation of extract,_borrowed? I suspect that might not work because of the same reason we can't currently have &Bound as an output.
I am still feeling a little uneasy about the choice to commit Borrowed more directly into the face of users. I think it is probably the right choice still, as long as we improve the documentation and API on borrowed. Maybe I will quickly try to get the perf wins out of extract_argument.rs later today to add another data point.
And (sorry to keep dangling ideas on here), inspired by the new IntoPyObject, I have two further perf related thoughts while we are making breaking changes here:
- In the past we have noted that the
PyResultreturn type forced conversion of errors toPyErrand made.extract()a performance footgun compared to.downcast(). There is still this problem forFromPyObject for Bound<T>. Should we add atype Errorin the same way we are adding forIntoPyObject? - It is possible to call
.extract()on e.g.Bound<PyString>to get aString, but this comes at the cost of a redundant Python type check becauseFromPyObjectalways works onPyAny. Could we add atype Source, and add a way to makeBound<T>directly offerextract()whereU: FromPyObject<Source = T>?
Both changes would significantly complicate the trait. I wonder, is there a way that we can support a simple trait like FromPyObject and a more advanced FromPyObjectImpl, which has a blanket from FromPyObject, so users can just implement the simple trait if that's good enough?
All food for thought and I think worth discussing while we're committing to breaking changes here.
And (sorry to keep dangling ideas on here), inspired by the new
IntoPyObject, I have two further perf related thoughts while we are making breaking changes here:
No problem 😄
- In the past we have noted that the
PyResultreturn type forced conversion of errors toPyErrand made.extract()a performance footgun compared to.downcast(). There is still this problem forFromPyObject for Bound<T>. Should we add atype Errorin the same way we are adding forIntoPyObject?
If we're willing to do more breaking changes here, that sounds like a good idea to me. The only downsides I can see
- sadly we can't provide a default type
- similar to
IntoPyObjectwe would wantFromPyObject::Error: Into<PyErr>, so this has the same effect on generic code and may tie into the same proposal you raised in https://github.com/PyO3/pyo3/pull/4489#discussion_r1730272529
- It is possible to call
.extract()on e.g.Bound<PyString>to get aString, but this comes at the cost of a redundant Python type check becauseFromPyObjectalways works onPyAny. Could we add atype Source, and add a way to makeBound<T>directly offerextract()whereU: FromPyObject<Source = T>?
This would be really cool indeed. I guess we would maybe somehow need to take advantage of method precedence between trait and inherent methods, to make the fall through to PyAny work...
If we're breaking FromPyObject, I wonder if we can solve #2888 similar to how we handled bytes in IntoPyObject 🤔
That would make a very nice symmetry. I think if we do it, then it should work for all containers, which would probably mean one hidden method per container (vec, smallvec, arrays), maybe these could be unified with "return position impl trait in trait" (with something like impl Iterator) once MSRV allows. The more challenging thing seems to be to restrict a default implementation in FromPyObject to FromPyObjectOwned...
If we're breaking
FromPyObject, I wonder if we can solve #2888 similar to how we handled bytes inIntoPyObject🤔
I've looked into this again. I think we would (for example) want something like this
trait FromPyObject<'a, 'py>: Sized {
// snip
#[doc(hidden)]
fn extract_array<const N: usize>(
obj: Borrowed<'_, 'py, PyAny>,
_: private::Token,
) -> PyResult<[Self; N]>
where
Self: FromPyObjectOwned<'py>,
{
todo!()
}
}
which we then use inside of extract on the array implementation and overwrite on the u8 implementation.
Unfortunately this currently does not compile. I believe this is the corresponding rustc issue: https://github.com/rust-lang/rust/issues/34979, which refers to the next trait solver as the solution and -Znext-solver does indeed make this work. Unless there is something I have overseen, I think we have to wait with the bytes specialization until -Znext-solver stabilizes.
Ah, nice research 👍. Agreed we can make a reminder for ourselves to improve bytes specializations once MSRV uses the next solve (gonna be a while 😂).
Given the early reception of IntoPyObject seems to be that it's hard, I'm currently feeling like we would build up some goodwill by being gentler on users for a few releases so am feeling like backing down on my other proposals for advanced optimisations above. We can introduce them incrementally in the future perhaps.
This branch really deserves to be merged asap, so please forgive me the delay, I think this now is highest thing on my to-do perhaps excluding a 0.23.1 patch release. I will try to play around with this branch and see how I feel about the ergonomics & perf, maybe once the kids are in bed tonight.
Given the early reception of
IntoPyObjectseems to be that it's hard, I'm currently feeling like we would build up some goodwill by being gentler on users for a few releases so am feeling like backing down on my other proposals for advanced optimisations above. We can introduce them incrementally in the future perhaps.
I tend to agree here. We can maybe think about adding the Error type. That seem pretty low hanging and fairly intuitive.
This branch really deserves to be merged asap, so please forgive me the delay,
Absolutely no worries :)
One idea which struck me, what if we keep FromPyObject with just a single lifetime and &Bound as input, and we instead expose what's currently called FromPyObjectBound as FromPyObjectBorrowed?
The idea would be that here we are currently breaking the main trait to get to two lifetimes, and we are introducing FromPyObjectOwned for simpler trait bounds, which kinda makes sense because that matches serde's DeserializeOwned.
But I think unlike serde's DeserializeOwned, most uses of FromPyObject do not want care about the second lifetime, so maybe it makes sense to keep that to a specialised trait .
This idea might have the nice boost that it's probably less breaking than adding the second lifetime to FromPyObject.
I feel like there's some detail about PyRef and second lifetimes which I might have overlooked with this idea. But it still seems interesting to think about further.
Interesting thought, I have't played around with it yet, but I opened with #4720 based on this including the changes adding the second lifetime to PyRef and PyRefMut (was quite a bit of lifetime reviewing until the borrow checker was happy 😅 )
Maybe we can use that to play around with and judge whats the best option here. My initial feeling says that the one single trait approach might have a bigger hurdle in the beginning but is easier in the long run (less traits, using common patters) but I haven't tried the other approach yet.
I did play around with the idea a bit. It looks like it is possible to stick the PyRef(Mut) changes from #4720 onto it without too many changes. However from a usability perspective it doesn't feel that great IMO. With the two trait approach one has to remember to implement FromPyObject if possible, because it is more general due to the blanket, but for uses in trait bounds FromPyObjectBorrowed should be used (if possible), because it includes more types. So by default one would need two different traits, depending on the context it is used in. I think there is a good chance that users will get this wrong and use a more restrictive bound than necessary (both in impls and trait bounds). I'm curious what you think about this.
Hmm, interesting. One nice thing about
one has to remember to implement
FromPyObjectif possible
is that I'd guess most users would choose to implement it anyway as it's the simpler to implement. It's quite appealing to me that the default is simple. Plus it's less breaking than adding the lifetime & using Borrowed as the input.
I also feel like it's probably quite rare to have a case where FromPyObjectBorrowed trait bound is actually critical. At worst case it probably forces an extra clone or reference counting operation if the bound is FromPyObject instead? And it's so hard to soundly get borrowed references from python operations (reading from a tuple is possibly the only way?) that probably most cases where a FromPyObjectBorrowed bound might actually help, it's probably where the caller already has the reference and could probably just use .extract() up front.
I'm quite happy to be wrong here, though my gut feeling is that if cheap-and-cheerful FromPyObject works 99% of the time and FromPyObjectBorrowed is the power tool to use for edge cases which borrow their input python objects then that might be an easier library for users to learn.
I though about this some more recently while trying to add FromPyObject::Error (independently from this). It turns out that having the traits split (like in main) kind of reduces the usefulness of such an error type. Often times the error could be DowncastError, but for that we need to have the second lifetime. So we either have to convert to PyErr, which we want to avoid, or switch to FromPyObjectBound which causes quite a few problem with the trait bounds.
Together with #4965 I think that two separate traits for this is too messy and confusing. The full switch proposed here has less edge cases/is more consistent and is easier to reason about IMO.
Sorry to have taken so long to circle back here.
I though about this some more recently while trying to add
FromPyObject::Error(independently from this). It turns out that having the traits split (like in main) kind of reduces the usefulness of such an error type. Often times the error could beDowncastError, but for that we need to have the second lifetime.
I guess that a new possibility since March might be that we could have use GATs and have type Error<'a>;, which should allow the error type to have a second lifetime even if the main trait didn't. But maybe there is a painful mismatch when then trying to pair that with a second trait which has a second lifetime.
If we're breaking
FromPyObject, I wonder if we can solve #2888 similar to how we handled bytes inIntoPyObject🤔I've looked into this again. I think we would (for example) want something like this
trait FromPyObject<'a, 'py>: Sized { // snip #[doc(hidden)] fn extract_array<const N: usize>( obj: Borrowed<'_, 'py, PyAny>, _: private::Token, ) -> PyResult<[Self; N]> where Self: FromPyObjectOwned<'py>, { todo!() } }which we then use inside of
extracton the array implementation and overwrite on theu8implementation.Unfortunately this currently does not compile. I believe this is the corresponding
rustcissue: rust-lang/rust#34979, which refers to the next trait solver as the solution and-Znext-solverdoes indeed make this work. Unless there is something I have overseen, I think we have to wait with the bytes specialization until-Znext-solverstabilizes.
I also just had the insight that if we don't have the second lifetime, then it compiles. i.e. this is fine:
trait FromPyObject<'py>: Sized {
// snip
#[doc(hidden)]
fn extract_array<const N: usize>(
obj: Borrowed<'_, 'py, PyAny>,
_: private::Token,
) -> PyResult<[Self; N]>
where
Self: FromPyObject<'py>,
{
todo!()
}
}
... interestingly, it seems like this is also true if we ended up replacing the lifetime as FromPyObject<'a> as per your idea in https://github.com/PyO3/pyo3/pull/4720#issuecomment-2495978288
I do quite like FromPyObject not having multiple lifetimes from a user simplicity perspective. So I am very tempted to see how this might look now; I have some time to experiment tonight / tomorrow morning.
... interestingly, it seems like this is also true if we ended up replacing the lifetime as
FromPyObject<'a>as per your idea in #4720 (comment)
Another angle on this idea: We could switch the lifetime like so
pub trait FromPyObject<'a>: Sized {
type Error: Into<PyErr>;
fn extract<'py>(obj: &'a Bound<'py, PyAny>) -> Result<Self, Self::Error>;
}
which should be able to support any type (including all which currently need FromPyObjectBound) which does not itself depend on 'py. This leaves mainly Bound and Borrowed (and PyRef(Mut), but if we replace them with PyClassGuard(Mut) as in #5233 this goes away) which we could move to PyFunctionArgument like is already done for &Bound. The main downsides are
- we would loose the ability to extract them inside of containers i.e.
Vec<Bound<'py, T>>, given thatVec<Py<T>>(or evenVec<PyClassGuard<'a, T>>in the future) would still be possible, maybe that's not too bad? - user defined types could no longer include
Bounds onlyPys, but they would gain the ability for borrow from'a
Just some thoughts, I'm still not really sure what direction we should take here. I still kind of want to get rid of at least one trait here. Having PyFunctionArgument, FromPyObject and FromPyObjectBound (or equivalent) that all kind of do the same thing makes it kind of hard to follow which types work where, what to implement and what to use as a trait bound...
I do quite like
FromPyObjectnot having multiple lifetimes from a user simplicity perspective.
I generally agree, but on the other hand the double lifetime approach is still the most flexible option, with the least "surprises". Having the functionality split only really hides the complexity for the easy case while often making it worse for the cases that need the additional flexibility. I'm kind of torn here...
TLDR; I have thought hard about this over the last week and think this proposed design is the right one and we should move forward with it 👍
I think given the scale of the break, let's release PyO3 0.26 first ASAP to get the with_gil rename shipped and then get this merged. In the meanwhile we can get this PR up-to-date with main and give it a few review passes.
I've been playing around with this branch and variants like switching the lifetime proposed just above.
The downside of switching the lifetime is it's pretty breaking:
- without reintroducing another trait (
FromPyObjectAttached) then all extraction ofBound<'py, T>straight up breaks - we still have trouble with containers which then need to implement
FromPyObjectAttachedand then it turns out thatFromPyObjectbecomes generally useless as a trait bound because of the same two-trait split issue you highlighted in https://github.com/PyO3/pyo3/pull/4390#issuecomment-2495455971
In general I've come to the same conclusion; a two-trait system is a usability issue, arguably a bit worse even than having the two lifetimes as proposed here.
I'm also reasonably convinced that Borrowed<'a, 'py, T> is correct over &'a Bound<'py, T>, because otherwise .extract() of &[u8] from Py<PyBytes> (for example) is arbitrarily restricted by implicit 'py: 'a of the ref-to-bound. Also, we need to use Borrowed because otherwise we lose the ability to support tuple extraction.
I also found that this proposal is compatible with optimized Vec<u8> extraction - see #5244
So in conclusion:
- I think the trait as implemented here is the right form; it's the only one with full flexibility
- The UX hurdle is the two lifetimes, hopefully we can solve this with good documentation both on the trait and in the migration guide.
- The (smaller) UX hurdle is the use of
Borrowed; again I think we should just embrace this and put good documentation on the type.
So let's move forward here with this PR? I'm happy to take on a separate PR to add documentation to Borrowed 👍
(And I'll hope that maybe in the long run the 'py lifetime might be replaced by more advanced Rust language features so everything simplifies 😂)
Ah, and while we're updating this branch I think we should get the Error type in.
I do have a rebased version of this locally, and an additional commit for the error type on top, which I can push here shortly.
CodSpeed Performance Report
Merging #4390 will improve performances by ×10
Comparing Icxolu:from-pyobject (06c37c9) with main (55d379c)
Summary
⚡ 1 improvements
✅ 92 untouched benchmarks
Benchmarks breakdown
| Benchmark | BASE |
HEAD |
Change | |
|---|---|---|---|---|
| ⚡ | not_a_list_via_extract |
1,856.7 ns | 182.2 ns | ×10 |