pyo3 icon indicating copy to clipboard operation
pyo3 copied to clipboard

Make `PyDict` iterator compatible with free-threaded build

Open bschoenmaeckers opened this issue 1 year ago • 15 comments

This pulls in the PyCriticalSection_Begin & PyCriticalSection_End functions new in 3.13 and use it to lock the PyDict iterators as d described here. I'm not sure about the PyCriticalSection struct definition. We cannot use the opaque_struct! macro to define this struct because we have to allocate enough space on the stack so we can pass the uninitialized pointer to PyCriticalSection_Begin. So some help would be appreciated!

depends on #4421 related to #4265

bschoenmaeckers avatar Aug 14 '24 13:08 bschoenmaeckers

I actually have a branch with these changes (more or less) that I was planning to do separately from that PR. Unfortunately the deadlock I found is caused by something else.

If you're planning to work on this stuff I'd appreciate it if you could comment on the tracking issue so we can coordinate work and avoid duplication.

ngoldbaum avatar Aug 14 '24 15:08 ngoldbaum

Replace the implementation with PyObject_GetIter and PyIter_Next (slow?)

I think we should seriously consider going this way and benchmark if it's actually a performance concern. We already made the same change for sets a couple of releases back, it wasn't a major performance impact there compared to the wins from the Bound API. Two reasons why we did it for sets:

  • _PySet_Next (or whatever the API was called) was a private API
  • It doesn't do the right thing for subclasses of sets with custom __iter__ functions

Similarly our current implementation doesn't respect dict subclasses with custom __iter__ functions? Should it? Probably, in which case we might just want to switch to PyObject_GetIter anyway.

davidhewitt avatar Aug 16 '24 20:08 davidhewitt

I opened https://github.com/PyO3/pyo3/pull/4477 with a different implementation of the FFI bindings.

ngoldbaum avatar Aug 23 '24 21:08 ngoldbaum

I opened https://github.com/PyO3/pyo3/pull/4477 with a different implementation of the FFI bindings.

Sorry for the late reply. Your implementation looks good, and the 'opaque_type!' use is exactly like what I was looking for. I will update my MR after the weekend.

bschoenmaeckers avatar Aug 23 '24 22:08 bschoenmaeckers

This use of the critical section api seems unwise. This allows users to create several critical sections (and worse) allows them to release them in arbitrary order. I don't think I understand the critical section api well but it seems guaranteed to cause issues.

I can see two obvious solutions:

  1. Replace the implementation with PyObject_GetIter and PyIter_Next (slow?)

  2. Implement some form of internal iteration:


impl PyDict{

    pub fn traverse<B>(&self, f: &mut impl FnMut(Bound<'py, PyAny>, Bound<'py, PyAny>) -> ControlFlow<B>) -> ControlFlow<B> {

        struct Guard { .. };

        impl Drop for Guard { ..release critical section }

        

        let mut cs = std::mem::MaybeUninit::zeroed();

        ffi::PyCriticalSection_Begin(cs.as_mut_ptr(), dict.as_ptr());

        let mut ma_used = ..;

        let mut di_used = ..;

        let key = ...;

        let value = ..;

        

        while PyDict_Next(...) != 0{

           f(key, value)?;

        }

        ControlFlow::Continue(())

    }

}




Interesting solutions 👀. I will try to implement the first one and test the performance hit after.

bschoenmaeckers avatar Aug 23 '24 22:08 bschoenmaeckers

CodSpeed Performance Report

Merging #4439 will not alter performance

Comparing bschoenmaeckers:PyDict_next_lock (e76c7b3) with main (133eac8)

Summary

✅ 83 untouched benchmarks

codspeed-hq[bot] avatar Aug 28 '24 17:08 codspeed-hq[bot]

ouch that does seem to be a big perf hit

ngoldbaum avatar Aug 28 '24 17:08 ngoldbaum

Yea this is really bad, but kind of expected as dict.items() creates a copy of the iterable and saves it into a PyList.

https://github.com/python/cpython/blob/main/Objects/dictobject.c#L3381-L3432

bschoenmaeckers avatar Aug 28 '24 18:08 bschoenmaeckers

I've also looked into iterating a raw dict but this only yields the keys. So it does not protect against modifications of the values before fetching them on the next call.

bschoenmaeckers avatar Aug 28 '24 19:08 bschoenmaeckers

I wonder if the critical section API is actually problematic in practice. You could try iterating over the same dict in many threads on the free-threaded build as a stress test. I'm not sure if there are other usage patterns that @mejrs might be concerned about.

It would be nice if we could still keep the fast path for dicts and then only degrade to the slow path if we're not handed an instance of PyDict_Type.

ngoldbaum avatar Aug 28 '24 19:08 ngoldbaum

Yea this is really bad, but kind of expected as dict.items() creates a copy of the iterable and saves it into a PyList.

https://github.com/python/cpython/blob/main/Objects/dictobject.c#L3381-L3432

dict.items() is equivalent to the Python 2 semantics where .items() in Python did create a new list. Is perf any better if you try .dict.call_method0("items") to get an iterable items view?

davidhewitt avatar Aug 28 '24 21:08 davidhewitt

Yea this is really bad, but kind of expected as dict.items() creates a copy of the iterable and saves it into a PyList. https://github.com/python/cpython/blob/main/Objects/dictobject.c#L3381-L3432

dict.items() is equivalent to the Python 2 semantics where .items() in Python did create a new list. Is perf any better if you try .dict.call_method0("items") to get an iterable items view?

I didn't know that this is different, learning something new every day! It is indeed somewhat faster. We went down from ~87% slowdown to ~63%.

bschoenmaeckers avatar Aug 28 '24 23:08 bschoenmaeckers

It would be nice if we could still keep the fast path for dicts and then only degrade to the slow path if we're not handed an instance of PyDict_Type.

I made the previous fast path available to non-freethreaded builds when the dict is not a subtype of PyDict. This gives us minimal performance regressions on existing code.

bschoenmaeckers avatar Aug 29 '24 16:08 bschoenmaeckers

Thank you for all your suggestions. Please review again if you have time. I've implemented fold and try_fold methods with critical sections. We probably want to run the full CI to validate my implementation. Furthermore I do not copy the dict on free-threaded builds and added a locked_for_each method. I will focus next on writing docs to guide people to away from using for x in dict on free-threaded.

bschoenmaeckers avatar Sep 27 '24 12:09 bschoenmaeckers

👍 thanks! I will try to read tonight or tomorrow evening!

davidhewitt avatar Sep 29 '24 16:09 davidhewitt