pyo3
pyo3 copied to clipboard
Add performance suggestions to docs
Hi, thanks so much for pyo3 - it's wonderful, I'm using it a lot (see rtoml and watchfiles, I might be using it pydantic soon too).
I wonder if it would be worth adding a "Performance Suggestions" or similar section to the docs, perhaps the FAQs?
Examples:
cast_as
vs extract
I spent a lot of the morning very worried that pyo3 rust making heavy use of python types was actually slower than python, until by chance I tried cast_as
instead of extract
.
// fast
let list: &PyList = py_any.cast_as()?;
// slow
let list: &PyList = py_any.extract()?;
This change in my case made a ~4x improvement to performance.
Optimised compilation
While investigating the above, I noticed this SO question referenced on #1470 didn't mention that the performance is massively effected by debug/release compilation. I've added an answer to the question with suggestions.
These are both no doubt very obvious to the maintainers of this library and sessioned users, but to those new to rust and pyo3, they might be less obvious.
I'm sure there are other things which can have a dramatic effect on performance which I'm not ware of - I've love to hear any other general suggestions?
Update:
Other things to suggest when this section is written:
- interning strings where they're being used a lot, a good example is if you're creating a dict with common keys, performance can be significantly improved by reusing
PyString
s rather than rust strings
I spent a lot of the morning very worried that pyo3 rust making heavy use of python types was actually slower than python, until by chance I tried cast_as instead of extract.
I am surprised by this. Shouldn't both cast_as
and extract
for a native type end up calling PyTryFrom::try_from
? cast_as
directly and extract
via pyobject_native_type_extract!
which should both end up here.
Is it possible to show that performance difference using a self-contained reproducer?
I tried the following benchmark
diff --git a/benches/bench_frompyobject.rs b/benches/bench_frompyobject.rs
index 8cf8a4da7..ca4b80bb2 100644
--- a/benches/bench_frompyobject.rs
+++ b/benches/bench_frompyobject.rs
@@ -1,6 +1,6 @@
-use criterion::{criterion_group, criterion_main, Bencher, Criterion};
+use criterion::{black_box, criterion_group, criterion_main, Bencher, Criterion};
-use pyo3::{prelude::*, types::PyString};
+use pyo3::{prelude::*, types::{PyList, PyString}};
#[derive(FromPyObject)]
enum ManyTypes {
@@ -18,8 +18,30 @@ fn enum_from_pyobject(b: &mut Bencher<'_>) {
})
}
+fn list_via_cast_as(b: &mut Bencher<'_>) {
+ Python::with_gil(|py| {
+ let any: &PyAny = PyList::empty(py).into();
+
+ b.iter(|| {
+ let _list: &PyList = black_box(any).cast_as().unwrap();
+ });
+ })
+}
+
+fn list_via_extract(b: &mut Bencher<'_>) {
+ Python::with_gil(|py| {
+ let any: &PyAny = PyList::empty(py).into();
+
+ b.iter(|| {
+ let _list: &PyList = black_box(any).extract().unwrap();
+ });
+ })
+}
+
fn criterion_benchmark(c: &mut Criterion) {
c.bench_function("enum_from_pyobject", enum_from_pyobject);
+ c.bench_function("list_via_cast_as", list_via_cast_as);
+ c.bench_function("list_via_extract", list_via_extract);
}
criterion_group!(benches, criterion_benchmark);
and found no measurable difference:
list_via_cast_as time: [1.1823 ns 1.1823 ns 1.1823 ns]
list_via_extract time: [1.8916 ns 1.8918 ns 1.8920 ns]
Could it be that something besides using extract
or cast_as
changed?
Code is currently closed source to avoid confusion, but if I can make it work it will end up being OS.
I'll invite you as a collaborator so you can see what I mean, feel free to ignore the invitation if perfer.
I'll invite you as a collaborator so you can see what I mean, feel free to ignore the invitation if perfer.
Since the commits always change multiple things (like PyObject
and <&PyAny>::extract
to just &PyAny
) or returning an error instead of raising an exception, I would suggest rolling back the mechanical replacements of extract
by cast_as
and checking whether it still makes a difference. (I expect it to not make a difference.)
EDIT: Just noticed the PR you opened doing exactly that.
I wonder if it would be worth adding a "Performance Suggestions" or similar section to the docs, perhaps the FAQs?
Regardless of the outcome of this issue I think this is a great idea 👍
So I think the problem is the following: The benchmarked code is basically the following
if let Ok(list) = any.extract::<&PyList>() { ... } else { ... }
where the else branch is always taken. extract
returns PyResult<T>
instead of Result<T, PyDowncastError>
which means into must call From<PyDowncastError> for PyErr
which will turn the PyDowncastError
into a string here.
The difference is indeed significant as show by the benchmark results:
list_via_cast_as time: [3.0937 ns 3.2319 ns 3.3556 ns]
list_via_extract time: [3.0724 ns 3.0798 ns 3.0897 ns]
not_a_list_via_cast_as time: [3.4965 ns 3.4974 ns 3.4982 ns]
not_a_list_via_extract time: [292.06 ns 292.80 ns 293.61 ns]
given by the diff
diff --git a/benches/bench_frompyobject.rs b/benches/bench_frompyobject.rs
index 8cf8a4da7..86d2047df 100644
--- a/benches/bench_frompyobject.rs
+++ b/benches/bench_frompyobject.rs
@@ -1,6 +1,6 @@
-use criterion::{criterion_group, criterion_main, Bencher, Criterion};
+use criterion::{black_box, criterion_group, criterion_main, Bencher, Criterion};
-use pyo3::{prelude::*, types::PyString};
+use pyo3::{prelude::*, types::{PyList, PyString}};
#[derive(FromPyObject)]
enum ManyTypes {
@@ -18,8 +18,52 @@ fn enum_from_pyobject(b: &mut Bencher<'_>) {
})
}
+fn list_via_cast_as(b: &mut Bencher<'_>) {
+ Python::with_gil(|py| {
+ let any: &PyAny = PyList::empty(py).into();
+
+ b.iter(|| {
+ let _list: &PyList = black_box(any).cast_as().unwrap();
+ });
+ })
+}
+
+fn list_via_extract(b: &mut Bencher<'_>) {
+ Python::with_gil(|py| {
+ let any: &PyAny = PyList::empty(py).into();
+
+ b.iter(|| {
+ let _list: &PyList = black_box(any).extract().unwrap();
+ });
+ })
+}
+
+fn not_a_list_via_cast_as(b: &mut Bencher<'_>) {
+ Python::with_gil(|py| {
+ let any: &PyAny = PyString::new(py, "foobar").into();
+
+ b.iter(|| {
+ black_box(any).cast_as::<PyList>().unwrap_err();
+ });
+ })
+}
+
+fn not_a_list_via_extract(b: &mut Bencher<'_>) {
+ Python::with_gil(|py| {
+ let any: &PyAny = PyString::new(py, "foobar").into();
+
+ b.iter(|| {
+ black_box(any).extract::<&PyList>().unwrap_err();
+ });
+ })
+}
+
fn criterion_benchmark(c: &mut Criterion) {
c.bench_function("enum_from_pyobject", enum_from_pyobject);
+ c.bench_function("list_via_cast_as", list_via_cast_as);
+ c.bench_function("list_via_extract", list_via_extract);
+ c.bench_function("not_a_list_via_cast_as", not_a_list_via_cast_as);
+ c.bench_function("not_a_list_via_extract", not_a_list_via_extract);
}
criterion_group!(benches, criterion_benchmark);
This only affects the Err
path, not the Ok
. But then again, if-else-if-chains to determine the correct type are probably common in Rust-code-called-from-Python-code.
EDIT: I also tried replacing err.to_string()
by String::new
and this is indeed almost all of the slow down. Only a small part is due to PyTypeError::new_err
itself.)
Using something like
#[derive(FromPyObject)]
enum ListOrDictOrAny<'a> {
List(&'a PyList),
Dict(&'a PyDict),
Any(&'a PyAny),
}
also does not help as the generated implementation is probably also an if-else-if chain of calls to extract
.
This is really interesting, thank you so much.
Stuff like this scares me a little, the explanation is beyond my knowledge. If I hadn't happened upon this fix, I would probably have abandoned the entire project assuming the performance gains I had hoped for weren't possible.
Agreed, I've wanted to write some notes along these lines for a long time. I think the main thing holding me off has been lack of time, and I'd kind of been hoping I could make progress in #1308 first (which would reduce a few known PyO3 overheads and likely change a number of performance best practices).
Another point for the performance section:
as per https://github.com/pydantic/pydantic-core/pull/501,
PyErr::new::<CustomError, _>(([args]))
is significantly slower than
let custom_error = CustomError::new([args]);
match Py::new(py, validation_error) {
Ok(err) => PyErr::from_value(err.into_ref(py)),
Err(err) => err,
}
For some errors, I think because PyErr::new
converts args to python objects and back again.
That change (found by chance) improved some of our benchmarks by 30%.
Superseded by #3310