pyo3 icon indicating copy to clipboard operation
pyo3 copied to clipboard

add second lifetime to `FromPyObject`

Open Icxolu opened this issue 1 year ago • 23 comments

This adds the second lifetime to FromPyObject and removes FromPyObjectBound. I hope I adjusted all the bounds correctly. I believe the equivalent to our current implementation is use HRTB for the input lifetime on any generic bound. But this should only be necessary for containers that create temporary Python references during extraction. For wrapper types it should be possible to just forward the relaxed bound.

For easier trait bounds FromPyObjectOwned is introduced. This is blanket implemented for any FromPyObject type that does not depend on the input lifetime. It is intended to be used as a trait bound, the idea is inspired by serde (Deserialize <=> DeserializeOwned)

I tried to document different cases in the migration guide, but it can probably still be extended. ~Changelog entry is still missing, will write that tomorrow.~

Breaking changes:

  • the second lifetime
  • the extract method without default
  • the addition lifetime bound on extract_bound to make the default work

Icxolu avatar Jul 28 '24 22:07 Icxolu

I propose that we add a FromPyObjectOwned trait similar to serde's DeserializeOwned. It's much easier to explain "just use this trait if the output doesn't borrow from Python" rather than "you need higher ranked trait bounds".

mejrs avatar Jul 28 '24 23:07 mejrs

Thanks! I think that's a great idea. It's essentially the same thing, but it's much easier to grasp by giving it a distinct name, plus we can better document it in the API docs. I will adapt this to make use of that.

Icxolu avatar Jul 29 '24 16:07 Icxolu

Thanks for the review! You are right, the choice of Borrowed over &Bound is definitely something to discuss about. I used it here because it was the final form of FromPyObjectBound that we landed on. Looking back, the change to Borrowed happened in #3959, which was related to the gil-refs API, so it might not be strictly necessary anymore (haven't checked). The points you mentioned seem like good justification why it still makes sense to introduce additional complexity here.

I think the upsides justify this, though we might want to increase the quality of documentation and examples on Borrowed (which I'd left quite minimal so far).

I agree, if we put Borrowed in such a core part of the infrastructure it deserves a documentation overhaul. We might also want to take a look at the current (public) API and additionally expose some helpers. Something like to_any comes to mind, which is currently internal only.

Icxolu avatar Aug 03 '24 19:08 Icxolu

Thanks for the feedback! I took another pass over it and tried to make it a bit more concise. I do however think that it's worth to talk about the individual lifetimes at play here, since this trait is right in the center of PyO3. I moved the paragraph about collections into the FromPyObjectOwned docs, because it works much better there with the code example, and replaced it with a smaller note for more info. I reworded my examples a bit and moved them under a # Details section, maybe this works better?

Let me know what you think about this.

Icxolu avatar Aug 04 '24 17:08 Icxolu

I believe this now also finishes the _bound methods deprecations 🎉

Icxolu avatar Aug 31 '24 15:08 Icxolu

Fantastic, thank you so much for your help on pushing that over the line. When I get some time I'll try to write up a list of what we might have left for 0.23 (hopefully not much).

davidhewitt avatar Aug 31 '24 19:08 davidhewitt

Ah, I just started playing with this locally and found one additional finding: having Borrowed<'a, 'py, T> as the input type means that it's impossible to implement FromPyObject<'a, 'py> for &'a Bound<'py, T> - because the input borrowed can't be cast into a &'a Bound<'py, T> for a sufficient lifetime (pointer goes in, pointer-to-pointer goes out).

Is that a problem? I think not, because that's what .downcast() is for (and it's potentially nice to separate them like this), however just worth a brief double-check that others agree.

davidhewitt avatar Aug 31 '24 20:08 davidhewitt

Oh, and one more case which comes up in local testing. At the moment from_py_with extractors take &Bound<'py, PyAny>. Should we be considering changing those to Borrowed too, for consistency?

davidhewitt avatar Aug 31 '24 20:08 davidhewitt

Should we be considering changing those to Borrowed too, for consistency?

It feels to me like we might want to consider macro tricks such that users can use both &Bound and Borrowed.

davidhewitt avatar Aug 31 '24 20:08 davidhewitt

having Borrowed<'a, 'py, T> as the input type means that it's impossible to implement FromPyObject<'a, 'py> for &'a Bound<'py, T>

That's true, but as you mention below, I actually think this a good thing and makes a clearer use case between .extract() and .downcast().

Oh, and one more case which comes up in local testing. At the moment from_py_with extractors take &Bound<'py, PyAny>. Should we be considering changing those to Borrowed too, for consistency?

Hmm, good question. I guess that is part of the question about how prominent we want Borrowed to be. Providing some flexibility via macro magic as you suggested sounds like a good compromise. I can explore that as a followup.

Icxolu avatar Sep 01 '24 00:09 Icxolu

Yes agreed, it's definitely a follow-up to make changes to expand from_py_with.

I wonder, are there any other options we have here? E.g. does it work to have a trait with both extract and extract,_borroeed, maybe with a default implementation of extract,_borrowed? I suspect that might not work because of the same reason we can't currently have &Bound as an output.

I am still feeling a little uneasy about the choice to commit Borrowed more directly into the face of users. I think it is probably the right choice still, as long as we improve the documentation and API on borrowed. Maybe I will quickly try to get the perf wins out of extract_argument.rs later today to add another data point.

davidhewitt avatar Sep 01 '24 06:09 davidhewitt

And (sorry to keep dangling ideas on here), inspired by the new IntoPyObject, I have two further perf related thoughts while we are making breaking changes here:

  • In the past we have noted that the PyResult return type forced conversion of errors to PyErr and made .extract() a performance footgun compared to .downcast(). There is still this problem for FromPyObject for Bound<T>. Should we add a type Error in the same way we are adding for IntoPyObject?
  • It is possible to call .extract() on e.g. Bound<PyString> to get a String, but this comes at the cost of a redundant Python type check because FromPyObject always works on PyAny. Could we add a type Source, and add a way to make Bound<T> directly offer extract() where U: FromPyObject<Source = T>?

Both changes would significantly complicate the trait. I wonder, is there a way that we can support a simple trait like FromPyObject and a more advanced FromPyObjectImpl, which has a blanket from FromPyObject, so users can just implement the simple trait if that's good enough?

All food for thought and I think worth discussing while we're committing to breaking changes here.

davidhewitt avatar Sep 01 '24 09:09 davidhewitt

And (sorry to keep dangling ideas on here), inspired by the new IntoPyObject, I have two further perf related thoughts while we are making breaking changes here:

No problem 😄

  • In the past we have noted that the PyResult return type forced conversion of errors to PyErr and made .extract() a performance footgun compared to .downcast(). There is still this problem for FromPyObject for Bound<T>. Should we add a type Error in the same way we are adding for IntoPyObject?

If we're willing to do more breaking changes here, that sounds like a good idea to me. The only downsides I can see

  • sadly we can't provide a default type
  • similar to IntoPyObject we would want FromPyObject::Error: Into<PyErr>, so this has the same effect on generic code and may tie into the same proposal you raised in https://github.com/PyO3/pyo3/pull/4489#discussion_r1730272529
  • It is possible to call .extract() on e.g. Bound<PyString> to get a String, but this comes at the cost of a redundant Python type check because FromPyObject always works on PyAny. Could we add a type Source, and add a way to make Bound<T> directly offer extract() where U: FromPyObject<Source = T>?

This would be really cool indeed. I guess we would maybe somehow need to take advantage of method precedence between trait and inherent methods, to make the fall through to PyAny work...

Icxolu avatar Sep 01 '24 12:09 Icxolu

If we're breaking FromPyObject, I wonder if we can solve #2888 similar to how we handled bytes in IntoPyObject 🤔

davidhewitt avatar Oct 11 '24 16:10 davidhewitt

That would make a very nice symmetry. I think if we do it, then it should work for all containers, which would probably mean one hidden method per container (vec, smallvec, arrays), maybe these could be unified with "return position impl trait in trait" (with something like impl Iterator) once MSRV allows. The more challenging thing seems to be to restrict a default implementation in FromPyObject to FromPyObjectOwned...

Icxolu avatar Oct 12 '24 12:10 Icxolu

If we're breaking FromPyObject, I wonder if we can solve #2888 similar to how we handled bytes in IntoPyObject 🤔

I've looked into this again. I think we would (for example) want something like this

trait FromPyObject<'a, 'py>: Sized {
   // snip

    #[doc(hidden)]
    fn extract_array<const N: usize>(
        obj: Borrowed<'_, 'py, PyAny>,
        _: private::Token,
    ) -> PyResult<[Self; N]>
    where
        Self: FromPyObjectOwned<'py>,
    {
        todo!()
    }
}

which we then use inside of extract on the array implementation and overwrite on the u8 implementation.

Unfortunately this currently does not compile. I believe this is the corresponding rustc issue: https://github.com/rust-lang/rust/issues/34979, which refers to the next trait solver as the solution and -Znext-solver does indeed make this work. Unless there is something I have overseen, I think we have to wait with the bytes specialization until -Znext-solver stabilizes.

Icxolu avatar Nov 16 '24 10:11 Icxolu

Ah, nice research 👍. Agreed we can make a reminder for ourselves to improve bytes specializations once MSRV uses the next solve (gonna be a while 😂).

Given the early reception of IntoPyObject seems to be that it's hard, I'm currently feeling like we would build up some goodwill by being gentler on users for a few releases so am feeling like backing down on my other proposals for advanced optimisations above. We can introduce them incrementally in the future perhaps.

This branch really deserves to be merged asap, so please forgive me the delay, I think this now is highest thing on my to-do perhaps excluding a 0.23.1 patch release. I will try to play around with this branch and see how I feel about the ergonomics & perf, maybe once the kids are in bed tonight.

davidhewitt avatar Nov 16 '24 12:11 davidhewitt

Given the early reception of IntoPyObject seems to be that it's hard, I'm currently feeling like we would build up some goodwill by being gentler on users for a few releases so am feeling like backing down on my other proposals for advanced optimisations above. We can introduce them incrementally in the future perhaps.

I tend to agree here. We can maybe think about adding the Error type. That seem pretty low hanging and fairly intuitive.

This branch really deserves to be merged asap, so please forgive me the delay,

Absolutely no worries :)

Icxolu avatar Nov 16 '24 14:11 Icxolu

One idea which struck me, what if we keep FromPyObject with just a single lifetime and &Bound as input, and we instead expose what's currently called FromPyObjectBound as FromPyObjectBorrowed?

The idea would be that here we are currently breaking the main trait to get to two lifetimes, and we are introducing FromPyObjectOwned for simpler trait bounds, which kinda makes sense because that matches serde's DeserializeOwned.

But I think unlike serde's DeserializeOwned, most uses of FromPyObject do not want care about the second lifetime, so maybe it makes sense to keep that to a specialised trait .

This idea might have the nice boost that it's probably less breaking than adding the second lifetime to FromPyObject.

I feel like there's some detail about PyRef and second lifetimes which I might have overlooked with this idea. But it still seems interesting to think about further.

davidhewitt avatar Nov 17 '24 23:11 davidhewitt

Interesting thought, I have't played around with it yet, but I opened with #4720 based on this including the changes adding the second lifetime to PyRef and PyRefMut (was quite a bit of lifetime reviewing until the borrow checker was happy 😅 )

Maybe we can use that to play around with and judge whats the best option here. My initial feeling says that the one single trait approach might have a bigger hurdle in the beginning but is easier in the long run (less traits, using common patters) but I haven't tried the other approach yet.

Icxolu avatar Nov 19 '24 18:11 Icxolu

I did play around with the idea a bit. It looks like it is possible to stick the PyRef(Mut) changes from #4720 onto it without too many changes. However from a usability perspective it doesn't feel that great IMO. With the two trait approach one has to remember to implement FromPyObject if possible, because it is more general due to the blanket, but for uses in trait bounds FromPyObjectBorrowed should be used (if possible), because it includes more types. So by default one would need two different traits, depending on the context it is used in. I think there is a good chance that users will get this wrong and use a more restrictive bound than necessary (both in impls and trait bounds). I'm curious what you think about this.

Icxolu avatar Nov 23 '24 12:11 Icxolu

Hmm, interesting. One nice thing about

one has to remember to implement FromPyObject if possible

is that I'd guess most users would choose to implement it anyway as it's the simpler to implement. It's quite appealing to me that the default is simple. Plus it's less breaking than adding the lifetime & using Borrowed as the input.

I also feel like it's probably quite rare to have a case where FromPyObjectBorrowed trait bound is actually critical. At worst case it probably forces an extra clone or reference counting operation if the bound is FromPyObject instead? And it's so hard to soundly get borrowed references from python operations (reading from a tuple is possibly the only way?) that probably most cases where a FromPyObjectBorrowed bound might actually help, it's probably where the caller already has the reference and could probably just use .extract() up front.

I'm quite happy to be wrong here, though my gut feeling is that if cheap-and-cheerful FromPyObject works 99% of the time and FromPyObjectBorrowed is the power tool to use for edge cases which borrow their input python objects then that might be an easier library for users to learn.

davidhewitt avatar Nov 23 '24 22:11 davidhewitt

I though about this some more recently while trying to add FromPyObject::Error (independently from this). It turns out that having the traits split (like in main) kind of reduces the usefulness of such an error type. Often times the error could be DowncastError, but for that we need to have the second lifetime. So we either have to convert to PyErr, which we want to avoid, or switch to FromPyObjectBound which causes quite a few problem with the trait bounds.

Together with #4965 I think that two separate traits for this is too messy and confusing. The full switch proposed here has less edge cases/is more consistent and is easier to reason about IMO.

Icxolu avatar Mar 10 '25 22:03 Icxolu

Sorry to have taken so long to circle back here.

I though about this some more recently while trying to add FromPyObject::Error (independently from this). It turns out that having the traits split (like in main) kind of reduces the usefulness of such an error type. Often times the error could be DowncastError, but for that we need to have the second lifetime.

I guess that a new possibility since March might be that we could have use GATs and have type Error<'a>;, which should allow the error type to have a second lifetime even if the main trait didn't. But maybe there is a painful mismatch when then trying to pair that with a second trait which has a second lifetime.

If we're breaking FromPyObject, I wonder if we can solve #2888 similar to how we handled bytes in IntoPyObject 🤔

I've looked into this again. I think we would (for example) want something like this

trait FromPyObject<'a, 'py>: Sized {
   // snip

    #[doc(hidden)]
    fn extract_array<const N: usize>(
        obj: Borrowed<'_, 'py, PyAny>,
        _: private::Token,
    ) -> PyResult<[Self; N]>
    where
        Self: FromPyObjectOwned<'py>,
    {
        todo!()
    }
}

which we then use inside of extract on the array implementation and overwrite on the u8 implementation.

Unfortunately this currently does not compile. I believe this is the corresponding rustc issue: rust-lang/rust#34979, which refers to the next trait solver as the solution and -Znext-solver does indeed make this work. Unless there is something I have overseen, I think we have to wait with the bytes specialization until -Znext-solver stabilizes.

I also just had the insight that if we don't have the second lifetime, then it compiles. i.e. this is fine:

trait FromPyObject<'py>: Sized {
   // snip

    #[doc(hidden)]
    fn extract_array<const N: usize>(
        obj: Borrowed<'_, 'py, PyAny>,
        _: private::Token,
    ) -> PyResult<[Self; N]>
    where
        Self: FromPyObject<'py>,
    {
        todo!()
    }
}

... interestingly, it seems like this is also true if we ended up replacing the lifetime as FromPyObject<'a> as per your idea in https://github.com/PyO3/pyo3/pull/4720#issuecomment-2495978288

I do quite like FromPyObject not having multiple lifetimes from a user simplicity perspective. So I am very tempted to see how this might look now; I have some time to experiment tonight / tomorrow morning.

davidhewitt avatar Jul 11 '25 19:07 davidhewitt

... interestingly, it seems like this is also true if we ended up replacing the lifetime as FromPyObject<'a> as per your idea in #4720 (comment)

Another angle on this idea: We could switch the lifetime like so

pub trait FromPyObject<'a>: Sized {
    type Error: Into<PyErr>;
    fn extract<'py>(obj: &'a Bound<'py, PyAny>) -> Result<Self, Self::Error>;
}

which should be able to support any type (including all which currently need FromPyObjectBound) which does not itself depend on 'py. This leaves mainly Bound and Borrowed (and PyRef(Mut), but if we replace them with PyClassGuard(Mut) as in #5233 this goes away) which we could move to PyFunctionArgument like is already done for &Bound. The main downsides are

  • we would loose the ability to extract them inside of containers i.e. Vec<Bound<'py, T>>, given that Vec<Py<T>> (or even Vec<PyClassGuard<'a, T>> in the future) would still be possible, maybe that's not too bad?
  • user defined types could no longer include Bounds only Pys, but they would gain the ability for borrow from 'a

Just some thoughts, I'm still not really sure what direction we should take here. I still kind of want to get rid of at least one trait here. Having PyFunctionArgument, FromPyObject and FromPyObjectBound (or equivalent) that all kind of do the same thing makes it kind of hard to follow which types work where, what to implement and what to use as a trait bound...

I do quite like FromPyObject not having multiple lifetimes from a user simplicity perspective.

I generally agree, but on the other hand the double lifetime approach is still the most flexible option, with the least "surprises". Having the functionality split only really hides the complexity for the easy case while often making it worse for the cases that need the additional flexibility. I'm kind of torn here...

Icxolu avatar Jul 13 '25 21:07 Icxolu

TLDR; I have thought hard about this over the last week and think this proposed design is the right one and we should move forward with it 👍

I think given the scale of the break, let's release PyO3 0.26 first ASAP to get the with_gil rename shipped and then get this merged. In the meanwhile we can get this PR up-to-date with main and give it a few review passes.


I've been playing around with this branch and variants like switching the lifetime proposed just above.

The downside of switching the lifetime is it's pretty breaking:

  • without reintroducing another trait (FromPyObjectAttached) then all extraction of Bound<'py, T> straight up breaks
  • we still have trouble with containers which then need to implement FromPyObjectAttached and then it turns out that FromPyObject becomes generally useless as a trait bound because of the same two-trait split issue you highlighted in https://github.com/PyO3/pyo3/pull/4390#issuecomment-2495455971

In general I've come to the same conclusion; a two-trait system is a usability issue, arguably a bit worse even than having the two lifetimes as proposed here.

I'm also reasonably convinced that Borrowed<'a, 'py, T> is correct over &'a Bound<'py, T>, because otherwise .extract() of &[u8] from Py<PyBytes> (for example) is arbitrarily restricted by implicit 'py: 'a of the ref-to-bound. Also, we need to use Borrowed because otherwise we lose the ability to support tuple extraction.

I also found that this proposal is compatible with optimized Vec<u8> extraction - see #5244


So in conclusion:

  • I think the trait as implemented here is the right form; it's the only one with full flexibility
  • The UX hurdle is the two lifetimes, hopefully we can solve this with good documentation both on the trait and in the migration guide.
  • The (smaller) UX hurdle is the use of Borrowed; again I think we should just embrace this and put good documentation on the type.

So let's move forward here with this PR? I'm happy to take on a separate PR to add documentation to Borrowed 👍

davidhewitt avatar Jul 18 '25 17:07 davidhewitt

(And I'll hope that maybe in the long run the 'py lifetime might be replaced by more advanced Rust language features so everything simplifies 😂)

davidhewitt avatar Jul 18 '25 17:07 davidhewitt

Ah, and while we're updating this branch I think we should get the Error type in.

davidhewitt avatar Jul 18 '25 17:07 davidhewitt

I do have a rebased version of this locally, and an additional commit for the error type on top, which I can push here shortly.

Icxolu avatar Jul 18 '25 17:07 Icxolu

CodSpeed Performance Report

Merging #4390 will improve performances by ×10

Comparing Icxolu:from-pyobject (06c37c9) with main (55d379c)

Summary

⚡ 1 improvements
✅ 92 untouched benchmarks

Benchmarks breakdown

Benchmark BASE HEAD Change
not_a_list_via_extract 1,856.7 ns 182.2 ns ×10

codspeed-hq[bot] avatar Jul 18 '25 18:07 codspeed-hq[bot]