multi-core-python channel_send

Currently the low-level function to "send" an object over a channel is fire-and-forget. We need a way to do the same thing but block until the object is received.

Solutions:

return a lock from channel_send() (and ignore it for the current functionality)
add a separate channel_send_wait() to explicitly wait

Aug 24 '18 19:08 ericsnowcurrently

Hi, I've been helping out with implementing the channel_list_interpreters API, and was wondering if it would be ok for me to also have a go at implementing this?

Nov 21 '19 16:11 benedwards14

I would love it if you did it! :) I was working on this as recently as April (leading up to PyCon), but tabled it when I decided at PyCon to focus on the making the GIL per-interpreter.

Apparently I was trying different approaches as I found that I have 4 different related branches:

https://github.com/ericsnowcurrently/cpython/tree/channel-send-wait https://github.com/ericsnowcurrently/cpython/tree/channel-send-wait-lock https://github.com/ericsnowcurrently/cpython/tree/channel-send-wait-callback https://github.com/ericsnowcurrently/cpython/tree/channel-send-wait-old

The first one is the one I was actually working on and the others were likely failed experiments. I'm sure they all have a lot of overlap. At the very least they can provide you direction. Hopefully you can pick up where I left off though. :)

I'd be glad to answer any questions you have on that, or even pair up with you for a little while if that would help. Just keep in mind that Since April nearly all the details have been completely flushed from my mental cache. :)

Nov 22 '19 22:11 ericsnowcurrently

Hi Eric,

My current plan with this is to implement a new API channel_send_wait() which has an optional timeout parameter. This would then add a lock to the data that is sent which is released whenever it is received. The sender would then wait for the lock to be released or until the timeout is done.

Let me know what you think?

Dec 11 '19 11:12 benedwards14

Thanks for that update. The approach sounds reasonable. I've a few comments below. Keep in mind that I tried that approach out and decided to do it differently. I'm not sure why, but I'll try to find a reason for you. Anyway...

My current plan with this is to implement a new API channel_send_wait()

That's reasonable. My implementation was re-using channel_send(), but I don't recall why I did that instead of a separate function. Maybe I figured it was unnecessary (not distinct enough?). Ultimately this is an internal API with a very specific use, so I don't think the approach matters too much. (Plus we can change it later if it's a problem.)

which has an optional timeout parameter.

A timeout parameter makes sense if the function itself waits for the operation to complete (and doesn't expose a lock or anything). That is certainly how the high-level API will work. However, the low-level API doesn't necessarily need to take the same approach. Regardless, there certainly needs to be way for the caller to provide a timeout before waiting.

This would then add a lock to the data that is sent which is released whenever it is received. The sender would then wait for the lock to be released or until the timeout is done.

This almost sounds like you are saying the function returns a lock (i.e. threading.Lock). However, the timeout parameter implies otherwise. If channel_send_wait() returns a lock then the caller would use that and would not need to pass a timeout to the channel_send_wait(). If it waits then you need to be sure to provide the caller with some indicator of whether the timeout was hit or not (e.g. return False if timed out).

Either way the trick is in how that lock is shared between interpreters. More on that in a sec.

Dec 13 '19 20:12 ericsnowcurrently

Just to clarify, the approach I settled on was to return a "wait" function from channel_send(). From my branch:

PyDoc_STRVAR(channel_send_doc,
"channel_send(cid, obj)\n\
\n\
Return a wait func after adding the object's data to the channel's\n\
queue.  The wait func blocks until the sent object has been received\n\
on the recv end of the channel (or the channel is closed).  The\n\
function optionally take a timeout.  If no timeout is given then the\n\
function blocks indefinitely.  When the sent object is received, the\n\
wait func returns True.  If a timeout was provided and the object was\n\
not received in the requested time then wait() returns False.  The\n\
wait func may be called any number of times.");

I'm pretty sure that implementation was fairly close to complete. However, I at this point I have no recollection of what was left to be done.

If I remember right, here are the approaches that I tried out:

blocks until object received or timeout reached
- channel_send_wait(cid, obj, timeout=None) => bool
- channel_send(cid, obj, timeout=0) => bool
  - returns False if timeout reached
- channel_send_wait(cid, obj, timeout=None)
  - raises something like ChannelTimeoutError if timeout reached
- good: leaks few implementation details
returns something that can be waited on
- channel_send(cid, obj) => threading.Lock
  - caller can use the lock like normal, for better or worse
  - bad: can't share the lock object between interpreters or its underlying lock (e.g. pthread_lock) without some significant changes to threading.Lock
- channel_send(cid, obj) => ChannelLock
  - good: can be more focused than threading.Lock
  - we only need one operation ("wait with timeout"), so why not just return a function...
- channel_send(cid, obj) => waitfunc
  - waitfunc(timeout=None) => bool -
  - good: stays focused on the needed use case
  - good: leaks fewer of the implementation details
  - good: like channel_send_wait(cid, obj, timeout=None) => bool but keeps the send separate from the waiting
- good: more flexible
- good: caller can ignore the returned object for no-wait behavior

As noted, I decided to go with that last one.

Dec 13 '19 20:12 ericsnowcurrently

So I opted to go with a solution that kept the "send" and "wait" operations separate. For a low-level function this would provide the most flexibility and aligns with how other low-level functions are designed.

I also chose to re-use channel_send(). For a low-level function this seemed more appropriate. However, I'm still not absolutely sure why I did it. It's probably because I didn't see the point of separate functions, given the approach I was taking.

Dec 13 '19 20:12 ericsnowcurrently

Some key points to be aware of:

this low-level function will only be used by the high-level implementation of PEP 554 (just like we discourage use of the low-level _thread module)
the implementation must be safe in a world where subinterpreters do not share a GIL; that means they cannot share objects and must be thread-safe in anything they do share (like low-level locks)
the shared lock must be destroyed by the same interpreter that created it

I'll add more points if I think of any.

Dec 13 '19 20:12 ericsnowcurrently

Anyway, unless there is a better alternative, I'd rather we take the same approach that I was pursuing before (regardless of if my implementation is used or not). However, I'd be glad to discuss this further.

Dec 13 '19 20:12 ericsnowcurrently

Hi Eric,

Thanks so much for all of this, its really helpful. I did not realise this was going to be an internal API so i think i will make a couple of changes.

So i think my new design should be channel_send_wait(cid, obj, timeout=-1) => bool:

I'm going with implementing a new internal API because the functionality of channel_send_wait() has a different functionality to channel_send() which just fires and forgets
I also went with returning a boolean, from your messages I got the sense that there are no current use cases just yet for returning a lock, so let me know if thats ok?
For the parameters and return:
- timeout:
  - if -1 then wait forever
  - Otherwise wait for that long for the message to be received
- Returns:
  - True if message is received
  - False if message has not been received by the end of the timeout

Internally my plan was to add a PyThread_type_lock structure to _channelitem, which would then be acquired by the sender and released by the receiver when they get the message. Meanwhile once the message has been sent, the sender will try to acquire the lock again using PyThread_acquire_lock_timed().

I think this approach should work. Anyway let me know what you think.

Dec 14 '19 17:12 benedwards14

@benedwards14, FYI, I'm planning on getting this wrapped up by the end of next week. I'd love to take advantage of the work you have done. Where are things at?

Apr 17 '20 20:04 ericsnowcurrently

Hi @ericsnowcurrently, to be honest with you i hit a snag around christmas and stopped working on it, but after spending a bit more time on it, i think ive got it working.

I'm going to spend a bit more time on it tomorrow and hopefully put up a pull request by end of tomorrow.

Apr 18 '20 22:04 benedwards14

Hey @ericsnowcurrently , i managed to finish my implementation done. Its still rough around the edges but I think it should work.

Along with bits I mentioned above, I've added what i call a wait_lock to the channelitem which is some bits, a mutex and the lock. The reason I did this is for the case when channel_send_wait times out and then the item is received it needs to know that the lock has already been freed, which became an interesting task on how not to seg fault.

How would you like me to put this up for review, is there a bpo or anything?

Apr 19 '20 19:04 benedwards14

the shared lock must be destroyed by the same interpreter that created it

Could I also ask why this must be the case?

Apr 20 '20 09:04 benedwards14

This is great, @benedwards14! Yeah, I'd love to see a PR. If it's okay with you, I'll hold off on feedback here in favor of reviewing the PR.

I don't see any issues open on BPO, so feel free to open a new one there (and nosy me).

Apr 20 '20 20:04 ericsnowcurrently

As to why we destroy the shared lock in the original interpreter, it is because the lock should be deallocated using the same allocator that allocated its memory. For now there is one process-global allocator guarded by a single process-wide GIL. However, eventually memory allocators (and the GIL) will be per-interpreter. So we are doing the work now to ensure that we handle memory safely in that future world.

Apr 20 '20 20:04 ericsnowcurrently

Hi @ericsnowcurrently , I have created a BPO and pull request here: https://bugs.python.org/issue40390 https://github.com/python/cpython/pull/19715

I'm a bit unsure about the fields of the bug I chose would be worth checking them

Apr 25 '20 19:04 benedwards14

Thanks, @benedwards14. I updated the BPO title and component and am starting a review of the PR.

Apr 28 '20 19:04 ericsnowcurrently