pyo3 icon indicating copy to clipboard operation
pyo3 copied to clipboard

Upcoming changes to PyO3's API in 0.21

Open davidhewitt opened this issue 2 years ago • 19 comments

Update: to avoid churn we have decided to keep Py<T> as-is and introduce the new API as Bound<'py, PyAny>. This naming can be revisited in future, but we have enough breaking changes with this 0.21 release that we do not wish to ship a wholesale rename at this point. (See https://github.com/PyO3/pyo3/issues/3674#issuecomment-1864891336)

The tracking issue of the remaining work to implement and make Bound<'py, PyAny> a public API is in #3684

We are also building documentation at https://pyo3.rs/main/migration#from-020-to-021


This is a ticket to pin to give visibility to discussion and feedback of an overhaul we're considering for PyO3's API, projected for 0.21 release later this year.

TLDR; we plan to move types like &'py PyAny to be stored instead in a smart pointer type Py<'py, PyAny>. The existing Py<T> type will continue to exist, but renamed, maybe to PySend<T> or PyStatic<T> (name ideas welcome).

The reason for doing this is because we believe this will offer a faster and lower memory usage API. The &'py PyAny types are reliant on an internal datastructure we call the "reference pool", and it's the cause of memory frustrations such as #1056. It's a thread-local datastructure, so it also has a performance overhead whenever it is accessed. From benchmarks in prototypes I think as much as 30% overhead can be removed when doing lots of operations with Python objects.

There will be some work needed to migrate to this new API, mostly due to changes in ownership semantics. I think it may be possible to expose the existing &'py PyAny API in a separate crate (say pyo3_pool, name to be bikeshed) to make migration easier.

There has been a lot of thought put into this API design over a large span of time. We want to preserve as many existing semantics as possible as well as provide an as simple a migration path as possible from current PyO3 code. We want PyO3 to be as fast as possible while also being accessible to Python users coming to Rust for the first time.

For the discussion of the latest prototype of this new API, see https://github.com/PyO3/pyo3/pull/3361

All feedback and questions welcome.

davidhewitt avatar Aug 11 '23 21:08 davidhewitt

but renamed, maybe to PySend<T> or PyStatic<T> (name ideas welcome).

I was somewhat enamoured with PyUngil<T> matching the Ungil trait.

adamreichold avatar Aug 12 '23 06:08 adamreichold

I could live with that, even if we have to rename it again in a nogil future.

davidhewitt avatar Aug 12 '23 06:08 davidhewitt

That's right, this might unnecessarily complicate things. My second least favourite was PyDetached because it comes with nicely fitting verbs attach/detach for method to switch between Py/PyDetached.

adamreichold avatar Aug 12 '23 06:08 adamreichold

Yes I quite like PyDetached too

davidhewitt avatar Aug 12 '23 06:08 davidhewitt

I am new to Pyo3, today I do a comparison test as follows:

Python.py run Rust app.pyd, read 1GB partition from a 41GB file, 1GB vector return to Python, then Python put this vector to run Rust writing this 1GB file to disk, total time is 25s.

_, csv_meta = pr.get_csv_partition_address(file_path, 1000) where 1000 is 1000MB csv_vector, csv_meta = pr.get_file_partition(file_path, csv_meta, csv_meta.partition_address[0], csv_meta.partition_address[1]) pr.write_csv(csv_vector, csv_meta)

If using another Rust code to run the Rust binary app, total time is 1.2s.

let (_, csv_meta) = get_csv_partition_address(&file_path, 1000); let csv_vector = get_file_partition(&file_path, csv_meta.partition_address[0], csv_meta.partition_address[1]); write_csv(csv_vector, csv_meta.clone());

I am exploring whether it is possible to implement zero copy of dataset between Python and Rust.

I use Python is mainly to support users setting of ETL workflow of my Rust App, many scenarios large dataset return from Rust will put it again to Rust without any modification in Python. I want this scenario support zero copy.

hkpeaks avatar Aug 12 '23 14:08 hkpeaks

Having modifying some setting with #Pyo3, Python->Rust vs Rust->Rust

  • Previouly is 25s vs 1.2s
  • Now is 1.4s vs 1.2s

I use a 41GB file for the following test, output 1 partition of 1GB to disk

import peakrs as pr df = pr.get_csv_partition_address(file_path, 1000MB) df = pr.get_file_partition(file_path, df, df.partition_address[0], df.partition_address[1])

pr.write_csv(df, "Test_Result.csv")

hkpeaks avatar Aug 13 '23 08:08 hkpeaks

@hkpeaks Please do not hijack this issue which is specific to the upcoming API changes. I you would like to discuss your experiments with us, please open a separate discussion. Thank you.

adamreichold avatar Aug 13 '23 08:08 adamreichold

Pyo3 is exceptional useful for my Rust project, how to achieve zero copy of dataset move between Rust and Python has been solved.

hkpeaks avatar Aug 13 '23 14:08 hkpeaks

I propose that PyCell<'py, T> becomes an alias for Py<'py, T>, and we can deprecate the alias. That way the API for pyclass and native types becomes more uniform.

davidhewitt avatar Aug 18 '23 13:08 davidhewitt

(We would still need a different PyCell<T> internally for all the borrow tracking, we can just detach it from the public API.)

davidhewitt avatar Aug 18 '23 13:08 davidhewitt

What would the equivalent of borrow(), try_borrow(), etc. be with this approach?

alex avatar Aug 18 '23 13:08 alex

The existing Py<T> already has borrow(), try_borrow() etc (if T: PyClass), so I would think it would work exactly the same. Py<'py, T> would have borrow(), try_borrow() etc if T: PyClass.

davidhewitt avatar Aug 18 '23 13:08 davidhewitt

Ah. Well then, I'd never noticed that. Sorry for the noise!

alex avatar Aug 18 '23 13:08 alex

All good worth stating explicitly!

davidhewitt avatar Aug 18 '23 13:08 davidhewitt

I'm not sure if this qualifies as "API change", but I'd love for modules to have an m_free and m_clear method, that explicitly releases the memory and de-initializes them.

holzschu avatar Aug 23 '23 12:08 holzschu

@holzschu I agree that'd be a nice feature but unrelated to the proposal in the OP. Probably best discussed on #3294 or similar proposals.

davidhewitt avatar Aug 23 '23 21:08 davidhewitt

One thing which I noted when implementing #3531 is that if multiple traits methods with the same name are applicable for one type we get ambiguity, and that would be the advantage of inherent methods.

I think we are expecting to have a 1:1 mapping of traits to types so this should not be a problem. How I noticed this is because our (internal) Py2::borrowed_from_gil_ref is generic over T: AsTypeInfo and takes &T::AsRefTarget as the input, so that means that the output Py2<T> is not fully constrained and needs additional annotating sometimes.

davidhewitt avatar Oct 21 '23 15:10 davidhewitt

Since we've got quite a few features and a couple of breaking changes beginning to take shape, I'm beginning to think it'll make sense to release 0.21 separate to (before?) this major API overhaul.

In the meanwhile I'll continue to chip away at the implementations like #3572, so hopefully if this doesn't make it into 0.21 we can land alpha releases of 0.22 with this API change committed shortly after.

davidhewitt avatar Nov 23 '23 00:11 davidhewitt

Just a status update here: after discussion of #3668 the leaning in https://github.com/PyO3/pyo3/issues/3674#issuecomment-1864333737 was that it makes sense to release this new API in 0.21, so for better or worse 0.21 is going to be quite a large release!

davidhewitt avatar Jan 18 '24 15:01 davidhewitt

PyO3 0.21 is released, this is now done.

davidhewitt avatar Mar 29 '24 15:03 davidhewitt

This change to PyO3 is great! I'm very excited by the prospect of an almost zero-cost but safe layer around the CPython API.

While studying these changes and PyO3 in general, I wonder: why was GILPool even introduced in the first place? Why wasn't something like Bound present in PyO3 since the beginning? I would be very grateful for some historic pointers.

Does gil-refs memory management have advantages over the new way? Or was it simpler to implement? Or just more obvious to come up with?

Thanks!

grothesque avatar May 20 '24 18:05 grothesque

While studying these changes and PyO3 in general, I wonder: why was GILPool even introduced in the first place? Why wasn't something like Bound present in PyO3 since the beginning? I would be very grateful for some historic pointers.

A great question. The GIL Refs design was made long ago (it predates me), and perhaps at the time it wasn't realised what the performance consequences were likely to be. You can find it in early commits to this repository but with very little discussion. GIL Refs main advantage was simplicity. The syntax &PyAny is concise and allowed users to avoid writing lifetimes in many cases, which was certainly very friendly for newcomers!

davidhewitt avatar May 21 '24 19:05 davidhewitt