Upcoming changes to PyO3's API in 0.21
Update: to avoid churn we have decided to keep Py<T> as-is and introduce the new API as Bound<'py, PyAny>. This naming can be revisited in future, but we have enough breaking changes with this 0.21 release that we do not wish to ship a wholesale rename at this point. (See https://github.com/PyO3/pyo3/issues/3674#issuecomment-1864891336)
The tracking issue of the remaining work to implement and make Bound<'py, PyAny> a public API is in #3684
We are also building documentation at https://pyo3.rs/main/migration#from-020-to-021
This is a ticket to pin to give visibility to discussion and feedback of an overhaul we're considering for PyO3's API, projected for 0.21 release later this year.
TLDR; we plan to move types like &'py PyAny to be stored instead in a smart pointer type Py<'py, PyAny>. The existing Py<T> type will continue to exist, but renamed, maybe to PySend<T> or PyStatic<T> (name ideas welcome).
The reason for doing this is because we believe this will offer a faster and lower memory usage API. The &'py PyAny types are reliant on an internal datastructure we call the "reference pool", and it's the cause of memory frustrations such as #1056. It's a thread-local datastructure, so it also has a performance overhead whenever it is accessed. From benchmarks in prototypes I think as much as 30% overhead can be removed when doing lots of operations with Python objects.
There will be some work needed to migrate to this new API, mostly due to changes in ownership semantics. I think it may be possible to expose the existing &'py PyAny API in a separate crate (say pyo3_pool, name to be bikeshed) to make migration easier.
There has been a lot of thought put into this API design over a large span of time. We want to preserve as many existing semantics as possible as well as provide an as simple a migration path as possible from current PyO3 code. We want PyO3 to be as fast as possible while also being accessible to Python users coming to Rust for the first time.
For the discussion of the latest prototype of this new API, see https://github.com/PyO3/pyo3/pull/3361
All feedback and questions welcome.
but renamed, maybe to PySend<T> or PyStatic<T> (name ideas welcome).
I was somewhat enamoured with PyUngil<T> matching the Ungil trait.
I could live with that, even if we have to rename it again in a nogil future.
That's right, this might unnecessarily complicate things. My second least favourite was PyDetached because it comes with nicely fitting verbs attach/detach for method to switch between Py/PyDetached.
Yes I quite like PyDetached too
I am new to Pyo3, today I do a comparison test as follows:
Python.py run Rust app.pyd, read 1GB partition from a 41GB file, 1GB vector return to Python, then Python put this vector to run Rust writing this 1GB file to disk, total time is 25s.
_, csv_meta = pr.get_csv_partition_address(file_path, 1000) where 1000 is 1000MB csv_vector, csv_meta = pr.get_file_partition(file_path, csv_meta, csv_meta.partition_address[0], csv_meta.partition_address[1]) pr.write_csv(csv_vector, csv_meta)
If using another Rust code to run the Rust binary app, total time is 1.2s.
let (_, csv_meta) = get_csv_partition_address(&file_path, 1000); let csv_vector = get_file_partition(&file_path, csv_meta.partition_address[0], csv_meta.partition_address[1]); write_csv(csv_vector, csv_meta.clone());
I am exploring whether it is possible to implement zero copy of dataset between Python and Rust.
I use Python is mainly to support users setting of ETL workflow of my Rust App, many scenarios large dataset return from Rust will put it again to Rust without any modification in Python. I want this scenario support zero copy.
Having modifying some setting with #Pyo3, Python->Rust vs Rust->Rust
- Previouly is 25s vs 1.2s
- Now is 1.4s vs 1.2s
I use a 41GB file for the following test, output 1 partition of 1GB to disk
import peakrs as pr df = pr.get_csv_partition_address(file_path, 1000MB) df = pr.get_file_partition(file_path, df, df.partition_address[0], df.partition_address[1])
pr.write_csv(df, "Test_Result.csv")
@hkpeaks Please do not hijack this issue which is specific to the upcoming API changes. I you would like to discuss your experiments with us, please open a separate discussion. Thank you.
Pyo3 is exceptional useful for my Rust project, how to achieve zero copy of dataset move between Rust and Python has been solved.
I propose that PyCell<'py, T> becomes an alias for Py<'py, T>, and we can deprecate the alias. That way the API for pyclass and native types becomes more uniform.
(We would still need a different PyCell<T> internally for all the borrow tracking, we can just detach it from the public API.)
What would the equivalent of borrow(), try_borrow(), etc. be with this approach?
The existing Py<T> already has borrow(), try_borrow() etc (if T: PyClass), so I would think it would work exactly the same. Py<'py, T> would have borrow(), try_borrow() etc if T: PyClass.
Ah. Well then, I'd never noticed that. Sorry for the noise!
All good worth stating explicitly!
I'm not sure if this qualifies as "API change", but I'd love for modules to have an m_free and m_clear method, that explicitly releases the memory and de-initializes them.
@holzschu I agree that'd be a nice feature but unrelated to the proposal in the OP. Probably best discussed on #3294 or similar proposals.
One thing which I noted when implementing #3531 is that if multiple traits methods with the same name are applicable for one type we get ambiguity, and that would be the advantage of inherent methods.
I think we are expecting to have a 1:1 mapping of traits to types so this should not be a problem. How I noticed this is because our (internal) Py2::borrowed_from_gil_ref is generic over T: AsTypeInfo and takes &T::AsRefTarget as the input, so that means that the output Py2<T> is not fully constrained and needs additional annotating sometimes.
Since we've got quite a few features and a couple of breaking changes beginning to take shape, I'm beginning to think it'll make sense to release 0.21 separate to (before?) this major API overhaul.
In the meanwhile I'll continue to chip away at the implementations like #3572, so hopefully if this doesn't make it into 0.21 we can land alpha releases of 0.22 with this API change committed shortly after.
Just a status update here: after discussion of #3668 the leaning in https://github.com/PyO3/pyo3/issues/3674#issuecomment-1864333737 was that it makes sense to release this new API in 0.21, so for better or worse 0.21 is going to be quite a large release!
PyO3 0.21 is released, this is now done.
This change to PyO3 is great! I'm very excited by the prospect of an almost zero-cost but safe layer around the CPython API.
While studying these changes and PyO3 in general, I wonder: why was GILPool even introduced in the first place? Why wasn't something like Bound present in PyO3 since the beginning? I would be very grateful for some historic pointers.
Does gil-refs memory management have advantages over the new way? Or was it simpler to implement? Or just more obvious to come up with?
Thanks!
While studying these changes and PyO3 in general, I wonder: why was
GILPooleven introduced in the first place? Why wasn't something likeBoundpresent in PyO3 since the beginning? I would be very grateful for some historic pointers.
A great question. The GIL Refs design was made long ago (it predates me), and perhaps at the time it wasn't realised what the performance consequences were likely to be. You can find it in early commits to this repository but with very little discussion. GIL Refs main advantage was simplicity. The syntax &PyAny is concise and allowed users to avoid writing lifetimes in many cases, which was certainly very friendly for newcomers!