pyo3 icon indicating copy to clipboard operation
pyo3 copied to clipboard

Consider making it possible to borrow `PyRef` in `#[pyclass]`

Open davidhewitt opened this issue 3 years ago • 20 comments

At the moment, as #1088 notes (and also #1085), it's difficult to expose things like iterators to Python because of the restriction that #[pyclass] cannot take a lifetime.

This restriction obviously makes sense because Rust-managed data cannot be borrowed and then exposed to Python; Python will allow this data to live arbitrarily long.

However, data that's already owned by Python, e.g. in a PyCell<T>, can be safely "borrowed" and stored in another #[pyclass]. We do this already, but only for the lifetime-free case Py<T>.

It would be incredibly powerful if PyRef<'a, T> (or some similar construct) was able to be stored inside another #[pyclass]. This would make make wrapping arbitrary iterators, for example, much easier.

This playground shows that with some carefully-placed transmutes it's possible to hack the type system to at least achieve the desired affect: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=041f9a6f5dfa3a5fc59ece7105cc4dfd

That code is a wildly unsafe sketch that doesn't actually use the PyO3 types, so please don't try to use it as-is. You're responsible for all segfaults if you do 😄. If you're finding yourself in need of this feature, I can offer mentoring to help design how PyO3 could make such a trick safe and ergonomic.

davidhewitt avatar Aug 07 '20 11:08 davidhewitt

Hitting #1088 when I'm trying to expose crfs::Model which has a lifetime parameter to Python. It would be great if this code works when data is a Python managed buffer.

#[pyclass]
struct Model<'a> {
    data: &'a [u8],  // or maybe &'a PyBytes
    model: crfs::Model<'a>,
}

messense avatar Mar 09 '21 15:03 messense

Last time I looked at this I concluded that having the lifetime parameter on the #[pyclass] is impossible, however something along this kind of line could work:

#[pyclass]
struct PyModel {
    data: PyBytes
    accessor: Box<dyn for<'py> Fn(Python<'py>) -> crfs::Model<'py>>
}

Rather than Box<dyn Fn> it would be nice to have a trait, but I think it might require generic associated types.

I haven't thought too hard in this space. I'm sure that something can be possible but it just needs research!

davidhewitt avatar Mar 09 '21 21:03 davidhewitt

That looks a lot like what ouroboros does for self-referential struct.

messense avatar Mar 10 '21 02:03 messense

Looks interesting 👀 Maybe we can extend #[new] to be enabled to take Python<'py>.

kngwyu avatar Mar 10 '21 12:03 kngwyu

I was able to get something working with ouroboros using self-referential struct to remove the lifetime parameter: https://github.com/messense/crfs-rs/pull/3

messense avatar Mar 10 '21 14:03 messense

That's really cool! I think we might be able to write an iterator example using that 🚀 (#1085)

davidhewitt avatar Mar 13 '21 23:03 davidhewitt

So, the natural extension of this would be if, instead of copying to a Vec<u8>, we could simply do a borrow from a Py<PyBytes>. I believe this should be safe for the same reasons.

alex avatar Jun 26 '21 20:06 alex

The hard bit is how to correctly constrain this to python-owned data. For example, &[u8] isn't enough; if we enabled something like the below I can easily write unsound code:

#[pyclass]
struct WithBorrowedData<'py> {
    py: Python<'py>,
    data: &'py [u8],
}

// trivial to create a function which invalidates lifetimes
#[pyfunction]
fn use_after_free(py: Python) -> PyResult<PyObject> {
    let data = vec![];
    Py::new(WithBorrowedData { py, data: data.as_slice() }.into()
}

// The return value of use_after_free contains an invalid data reference

I've been thinking that PyRef (or something similar) would need to be used as a wrapper. Maybe a PyRef::map function would be useful.

Tbh this potentially also ties in with #1308 and how we represent owned / borrowed python data in general.

davidhewitt avatar Jun 27 '21 08:06 davidhewitt

Yes, I think the desired behavior is something like:

#[ouroboros::self_referencing]
struct Foo {
    owner: Py<PyBytes>,
    #[borrows(owner)]
    slice: &'this [u8]
}

#[pyfunction]
fn f(data: Py<PyBytes>) {
    Foo::new(data, |data| data.as_bytes())
}

Effectively the same pattern as https://github.com/pyca/cryptography/blob/main/src/rust/src/ocsp.rs#L32-L42, except without copying to a Vec.

This should be safe because if you have a Py<PyBytes> you own a reference to it, and PyBytes are immutable, so it will never be freed behind your back.

alex avatar Jun 27 '21 12:06 alex

If I understand correctly, reason that the above doesn't currently work with #[ourobouros::self_referencing] is because the slice access closure doesn't have access to a Python GIL token? It seems like that should be solvable by providing an equivalent of ourobouros::self_referencing inside PyO3, but that is probably a lot of work...

davidhewitt avatar Jun 27 '21 21:06 davidhewitt

Yeah, the ideal thing is probably finding a way for them to play nicely together. I'm not sure how complicated that is.

On Sun, Jun 27, 2021 at 5:04 PM David Hewitt @.***> wrote:

If I understand correctly, reason that the above doesn't currently work with #[ourobouros::self_referencing] is because the slice access closure doesn't have access to a Python GIL token? It seems like that should be solvable by providing an equivalent of ourobouros::self_referencing inside PyO3, but that is probably a lot of work...

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/PyO3/pyo3/issues/1089#issuecomment-869223389, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAAGBAYPDQ65FYSC3JKMSTTU6G4RANCNFSM4PXQQD5A .

-- All that is necessary for evil to succeed is for good people to do nothing.

alex avatar Jun 27 '21 21:06 alex

This is my attempt at allowing a PyClass to hold a reference to another PyClass. I think it is safe, but not sure if it has other issues.

https://gist.github.com/b05902132/de18debe9039473aa9b0bed6b48436c8

As an example, I wrote a simple class backed by Vec and its iterator. Hope this helps.

jovenlin0527 avatar Nov 16 '21 10:11 jovenlin0527

👍 thanks, that's very interesting and roughly along the lines of what I was thinking. It would be nice to be able to handle that mapped value as a real Python object too. That'll provide a way for us to eventually avoid cloning in #1358

(This will probably require lots of changes to the internal of PyCell.)

davidhewitt avatar Nov 17 '21 23:11 davidhewitt

Hitting this again in https://github.com/messense/tree-sitter-py/blob/05d5adde312afd22c8ac9f59f07731e17cf9a71a/src/lib.rs#L116-L122 but this time ouroboros can't help. The error message when uncommented is

error[E0495]: cannot infer an appropriate lifetime for autoref due to conflicting requirements
   --> src/lib.rs:118:18
    |
118 |             self.borrow_cursor().node()
    |                  ^^^^^^^^^^^^^
    |
note: first, the lifetime cannot outlive the anonymous lifetime defined on the method body at 116:13...
   --> src/lib.rs:116:13
    |
116 |     fn node(&self) -> PyNode {
    |             ^^^^^
note: ...so that reference does not outlive borrowed content
   --> src/lib.rs:118:13
    |
118 |             self.borrow_cursor().node()
    |             ^^^^
note: but, the lifetime must be valid for the anonymous lifetime #1 defined on the body at 117:49...
   --> src/lib.rs:117:49
    |
117 |           PyNode::new(self.borrow_tree().clone(), |tree| {
    |  _________________________________________________^
118 | |             self.borrow_cursor().node()
119 | |         })
    | |_________^
note: ...so that the expression is assignable
   --> src/lib.rs:118:13
    |
118 |             self.borrow_cursor().node()
    |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    = note: expected `Node<'_>`
               found `Node<'_>`

Basically tree-sitter has aTree struct that can produce a Node<'_> and a TreeCursor<'_>, then TreeCursor<'_> can also produce a Node<'_>, it use PhantomData to enforce lifetime variance.

messense avatar Nov 20 '21 02:11 messense

Note to self: when looking at the internals of #[pyclass] to make this possible, also think about use of Box / Arc and how is might impact adding conversion implementations for them, xref #3014 #3729

davidhewitt avatar Feb 03 '24 21:02 davidhewitt

FYI other Mercurial developers and I added this feature (or at least something similar) in rust-cpython back in 2019: http://dgrunwald.github.io/rust-cpython/doc/cpython/struct.PySharedRefCell.html

I don't know if something similar can be achieved, but I figured you'd like to know. I would certainly need this feature to actually use PyO3 for non-trivial classes within the context of Mercurial.

Alphare avatar Feb 12 '24 17:02 Alphare

Fwiw the thing I described above, borrowing from a PyByted works, and is what cryptography uses these Days

alex avatar Feb 12 '24 17:02 alex

From the linked docs, it seems that using this requires unsafe code? If so, one can already use unsafe code to replace lifetimes by 'static. I think this issue is about providing a safe interface to achieve this.

adamreichold avatar Feb 12 '24 17:02 adamreichold

We have some unsafe related to other parts of our ouroborous stuff, but just having a strict that borrows a PyByted is safe

On Mon, Feb 12, 2024, 12:15 PM Adam Reichold @.***> wrote:

From the linked docs, it seems that using this requires unsafe code? If so, one can already use unsafe code to replace lifetimes by 'static. I think this issue is about providing a safe interface to achieve this.

— Reply to this email directly, view it on GitHub https://github.com/PyO3/pyo3/issues/1089#issuecomment-1939179165, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAAGBCBMARJUTQ6YBUCVKLYTJE2FAVCNFSM4PXQQD5KU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJTHEYTOOJRGY2Q . You are receiving this because you commented.Message ID: @.***>

alex avatar Feb 12 '24 17:02 alex

Sorry, this was us racing GitHub comments. I was referring to PySharedRefCell apparently requiring unsafe.

adamreichold avatar Feb 12 '24 17:02 adamreichold