pyo3 Safely passing a mutable reference to Python?

I have a feeling I'm not alone in wanting to pass a mutable Rust reference to Python code. In my case, I want Python code to draw into a Piet RenderContext. There is basically no reasonable way to do this without a mutable reference, for example by copying data.

Having read through the code and some blogs on the issue, I believe I may have a workable answer. Probably easiest to post my code:

https://gist.github.com/raphlinus/479df97a7ba715b494b87af9071e9b87

The short explanation is that the PyCell holds a RefCell to an optional mutable reference. There's also a guard object which clears that reference when the guard is dropped. There is a small amount of unsafe, but it's only to adjust lifetimes; I'm not mucking with raw pointers at all (as I had done in a previous iteration). In particular, I'm relying on RefCell for mutability guarantees, so I'm not particularly worried about that.

I post this issue mostly to ask for review that the approach is sound. It's also possible there's some easier way to do this I'm missing. If it is a good approach, likely it should be documented, as it seems like something a lot of people want to do.

Many thanks for your consideration!

Sep 13 '20 15:09 raphlinus

Any time your depending on Drop being run on some guard for safety, it's trivial to show that your API design is not safe: just mem::forget the guard, which is a safe operation.

Sep 13 '20 17:09 programmerjake

Right, I should have figured that out, thanks for the reminder. What about a scoped API, where I pass the &PyCell to a closure? I'll explore it in any case.

Sep 13 '20 17:09 raphlinus

that could work, see rayon::scope for a similar scoped API.

Sep 13 '20 18:09 programmerjake

The gist has been updated, the second file is my attempt at a scoped version. It feels like it might be possible to refine further, but I'm very interested in feedback on whether it's on the right track.

Sep 13 '20 18:09 raphlinus

Thank you! After looking over the gist for some minutes, I think now I understand your idea. Python and Rust have very different lifetime strategies: one is by reference count, one is by static lifetime analysis... But we can enforce Rust's lifetime to Python objects, by doing obj = None when dropping.

This is a cool idea, but I'm still struggling to understand the context where this technique is usable. To make this work, we need a scope StringRef::scoped or RAII guard. So it is limited to 'Python in Rust code' usage, and then I think even Option can be unnecessary due to Python's single-threaded nature. Do you intend only this usage? Or do you plan to extend this to 'Rust object in Python code' situation?

Sep 14 '20 05:09 kngwyu

Sure, I'm happy to explain the motivation behind it. We want to have a GUI app (our main use case right now is a font editor) where we also have plugins written in Python, and we want those plugins to draw onto a canvas.

That canvas is represented by a &mut RenderTarget object. In particular, the runloop of the GUI app generates a render target, then passes it into the paint method of various widgets. We want to be able to pass that into Python code. Then we will have wrappers where Python code will call methods for drawing paths, etc. So there are both calls from Rust into Python and Python into Rust going on here.

If the Python code were to retain the reference, for example stashing it into global state, then return, it would be a dangling reference that could exploit UB, most likely a crash. We want to prevent that. To be clear, we're willing to tolerate some unsafe patterns on the Rust side, though would want to keep that to a minimum. But we definitely want to make it difficult or impossible for people to crash the program on the Python side by such common errors as use-after-free, type errors such as passing an image where a gradient was expected, etc.

The type that I wrote is clunky for sure, with lots of options and two different flavors of refcell; I think it's likely it could be streamlined, but didn't clearly see how to do that.

Feel free to ask any questions. Not to brag, but this is not the first time I've tried to integrate cross-platform GUI with Python. You can see some of my experiments from 14 or so years go :)

Sep 14 '20 05:09 raphlinus

Interesting application, thank you for the detailed explanation! I'll do my best to give helpful reviews, but I'm sorry if I have some misunderstanding :)

Your Option approach looks good especially when you run the extension with some context, like:
```
let extension_context = PyDict::new();
py.run(code, None, Some(extension_context)).unwrap(); // Run extension code with some context
```
In such cases, extension programs can set &mut RenderTarget anywhere by setting it to the globals, and thus restricting the period when it is really usable is necessary.
I'm kind of doubtful about the use of RefCell. Though you may already know, PyCell behaves like RefCell when it exposed to Python (see https://docs.rs/pyo3/0.12.0/pyo3/pycell/struct.PyCell.html). So just PyCell<Option<...>> could work, unless it is already managed as Rc<RefCell<...>> or so.
'static can be irritating but now I have no better way to deal with this.

Sep 14 '20 13:09 kngwyu

Ah ok, I think I understand this better now. I agree that the RefCell can be removed, it doesn't add any additional safety.

It seems to me difficult to write a fully safe Rust wrapper for this pattern. The risk is that the &'static mut String reference (or any wrapper that holds this reference) may be moved out of the PyCell that contains it, as all types in Rust can be moved. (I found it helpful to review the documentation for the rental crate and the relatively new Pin api)

But I think we may have a situation that is good enough for our needs. We can have a relatively small amount of unsafe in our code, and then as we only use the reference in a way that respects the scope pattern, it should be safe from the Python side.

It seems maybe possible to express this pattern in a macro, so that all unsafe uses are hidden, but this is beyond my current scope.

I think there may be documentation value in writing about this pattern, what level of safety is possible, where unsafe is necessary, and what conditions must be maintained to ensure the soundness of the resulting system.

Sep 14 '20 15:09 raphlinus

I've made another attempt at the problem, this time using pointers rather than trying to transmute the lifetimes of references. I believe it may address the safety issues of my previous version. Of course, as with all unsafe code, there's a very good chance I've missed something and there is a soundness hole.

Even so, I've tried to be reasonably careful. I did some sanity-checking with miri to establish that the basic pattern of "tunneling" the mutable reference through turning it into a pointer and then re-creating it later is valid. I've also added a py: Python<'p> guard to the borrow method to ensure that the GIL is held, as it would clearly be unsafe to call this method otherwise.

I've also this time improved the signature, so that it takes any T and not just String, and separated the creation of the Rust wrapper object from the Python class creation. Thus, the unsafe should be confined entirely to the safe wrapper itself (which I've put into a separate module to emphasize), and it should be possible to use this with essentially any inner type.

Sep 14 '20 19:09 raphlinus

👍 I've been busy the past few days but lurking on this conversation; I was going to say that pointers are probably better than transmuting lifetimes. I'm keen to take a read of your most recent solution when I have a chance. Thanks for looking into this!

Sep 14 '20 19:09 davidhewitt

I withdraw that particular gist, there's an issue with a dangling pointer to the stack. I think it can be fixed with a bit of boxing.

Sep 14 '20 20:09 raphlinus

Ok, fourth try. This one is based on Rc, and is considerably simpler; I realized the scope function doesn't really need access to the PyCell, so the somewhat terrifying signature is now much less so.

The allocation of the Rc is in service of making the wrapper movable. I can see how to eliminate it if there is an unsafe burden on the client of the API, but otherwise I'm not sure it's possible. The code generated for try_get_mut is well within reason, though one or two branches could be eliminated if the code were specialized: the weak count and strong count are both {0,1}.

I believe this version is worthy of review, but am not yet making the claim it's actually sound. Apologies for the previous false starts; unsafe code is hard.

Sep 14 '20 20:09 raphlinus

The Weak pattern looks cool, especially when the internal mutable object is managed by Rc. But somehow I feel it hard to understand why it is necessary even with scoped objects. Or,

I withdraw that particular gist, there's an issue with a dangling pointer to the stack.

could you please explain this more?

Sep 15 '20 15:09 kngwyu

Why is the Rc necessary? It's because I want to encapsulate just the concept of the safe wrapper of to the reference, and especially make it movable from the safe wrapper into completely arbitrary user code, most likely a constructor for a PyClass object that will embed the wrapped reference as a field.

If I were to tolerate an unsafe burden on the user of the wrapper, and wanted to avoid the Rc, then I would have some mechanism to zero out the wrapper at the end of the scope. But I don't see a clean way to enforce that: the callback of the scope function can take that wrapper and move it out of the scope.

The dangling pointer is a fairly straightforward issue. The obj_ptr_ptr value is a pointer to the obj_ptr variable on the stack frame of scope. That pointer is placed into the SafeWrapper object, whose lifetime can extend past scope itself (although it won't in normal usage). I did test it locally, and it seemed to work, but only because of the coincidence that the location was still zero.

The Rc solves these problems: the Weak container can be moved around freely, and if there is an attempt to access the inner pointer after the scope function has completed, then the attempt to downgrade will fail. The only downside is the extra allocation. For my particular application, I find this a completely acceptable tradeoff. We're not going to call scope very often, even though we might do many (hundreds or thousands) of drawing calls on the context object within that scope.

We don't need the full generality of Rc here, because we never actually clone the Rc. If we were going to make a custom implementation, it could be a Box of the pointer and a bool, all in an unsafe cell. The try_get_mut function would just check whether the pointer is null, and provide Some mutable reference if not; I believe this would compile down to no code, but haven't checked that in Godbolt yet. The beginning of scope would set the bool to false. The end of scope would zero the pointer in any case, and check the bool. If it's true, it would deallocate the box. The drop function on the wrapper would check whether the pointer was null. If so, it would deallocate the box. If not, it would set the bool to true (note that this is the normal case). This behavior is the same as Rc, just with less extra steps. I wouldn't expect a huge improvement in performance though, as the number of allocations is the same.

I hope this explanation is useful. Feel free to ask if there are more questions. Having slept on it, I am more confident that the approach is viable, but obviously would welcome careful review before expecting such a wrapper would be adopted, in case there's still a safety problem I've overlooked, or an opportunity to make it more efficient.

Sep 15 '20 17:09 raphlinus

I don't think you even need Rc or Weak. Assuming I didn't mess up somewhere, the following should work:

#[pyclass]
pub struct MyMutRef {
    v: Option<NonNull<MyValue>>,
}

impl MyMutRef {
    pub fn get(&mut self) -> Option<&mut MyValue> {
        unsafe { self.v.as_mut().map(|v| v.as_mut()) }
    }
    pub fn scope<R>(py: Python<'_>, v: &mut MyValue, f: impl FnOnce(Python<'_>, &Py<MyMutRef>) -> PyResult<R>) -> PyResult<R> {
        struct PanicOnDrop(bool);
        impl Drop for PanicOnDrop {
            fn drop(&mut self) {
                if self.0 {
                    panic!("failed to clear correct MyMutRef instance"); // abort via double-panic
                }
            }
        }
        struct Guard<'py>(Python<'py>, Py<MyMutRef>, NonNull<MyValue>);
        impl Drop for Guard<'_> {
            fn drop(&mut self) {
                let panic_on_drop = PanicOnDrop(true);
                let guard = self.1.borrow_mut(self.0);
                assert_eq!(guard.v.take(), Some(self.2), "MyMutRef instance was moved somewhere else");
                panic_on_drop.0 = false;
            }
        }
        let guard = Guard(py, Py::new(py, MyMutRef { v: Some(v.into()) })?, v.into());
        f(py, &guard.1)
    }
}

Sep 15 '20 18:09 programmerjake

I'll let others critique the above, but I should clearly state another goal: ideally I'd like a safety boundary so that I can add whatever methods I like to the #[pyclass] object, and those methods can't cause unsafety from safe code. If methods are added and the v field remains basically a raw pointer, I can see lots of things that can go wrong.

Sep 15 '20 19:09 raphlinus

The use of Rc has a very subtle soundness hole: it can be moved into a context where Drop is not protected by a 'p lifetime ensuring the drop happens while the GIL is held.

Fortunately, this can be fixed with atomics. I've looked at the generated code in Godbolt and it's very nice, at least on x64. That said, this code feels like it's skating pretty close to the edge wrt soundness. I've written it so that if there is a race (which I believe will only happen when there's an intentional effort), then the box can leak. This is not a soundness issue and I consider it an acceptable consequence. If that were not desired, at the expense of slightly slower code, the atomic operation in the drop function could be replaced with a swap with Ordering::Release, which would compile to an xchg instruction.

Sep 17 '20 15:09 raphlinus