ray
ray copied to clipboard
[Core] Ray-based concurrent.futures.Executor
Why are these changes needed?
This PR provides RayExecutor, a drop-in replacement for ProcessPoolExecutor and ThreadPoolExecutor from concurrent.futures, which executes tasks on a Ray cluster.
Related issue number
Closes #29456
Checks
- [x] I've signed off every commit(by using the -s flag, i.e.,
git commit -s
) in this PR. - [x] I've run
scripts/format.sh
to lint the changes in this PR. - [x] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
- [x] Unit tests
- [x] Release tests
- [ ] This PR is not tested :(
The three failing tests don't seem to have anything to do with our contribution. The Docker Bootstrap test, at the very least, failed because Ubuntu failed to update.
Perhaps someone at Ray could shed some light on the failures?
sorry for the delay. I will review this tomorrow!
@rkooo567 can you take another look at this ?
Hmm actually before deep diving into the review, is it currently only support submitting 1 task? It looks like current subclasses of Executor (ProcessExecutor and ThreadPoolExecutor) both has a concept of pool (max_workers). Should we have the same thing?
Thanks @rkooo567! I've refactored quite a bit to make multiple task submission much clearer. The executor now has a max_workers
argument which will execute tasks over an actor pool. If that kwarg is not specified, multiple tasks can still be submitted to the same RayExecutor instance, but scheduling will be managed entirely by ray (i.e. no fixed set of actor/workers).
thanks for the update and sorry for the delay! I am planning to review this tomorrow (have been off around new year!)
btw there are lots of test failures. Do you mind merging the latest master?
I made a first pass! Thanks for addressing the initial feedback.
Thanks @rkooo567! We'll get to these comments ASAP.
Thanks for your prompt replies @rkooo567. I'm busy with a little bit of a refactor again to remove ActorPool
s, as you suggested (and to clarify the object store issues). This shouldn't take me longer than a few days at most, but I'll ping you here when I've done all of that. 👍🏻
Sgtm! Let me know! I will follow up with the rest of comments when you get back to me :). Really excited for this contribution!
Sgtm! Let me know! I will follow up with the rest of comments when you get back to me :). Really excited for this contribution!
I'm busy with this again, shouldn't be long now :D
@rkooo567 @jjyao I've simplified this PR quite a bit by removing actor pools, and I've responded to the majority of your queries above. It seems to me that the major outstanding issue is dealing with shutdowns.
@rkooo567, you suggested (https://github.com/ray-project/ray/pull/30826#discussion_r1089608649) that
- if ray is already initialised we don't shut it down (on exiting the context), but
- if it is not initialised we shut it down.
I'm not sure I understand the flow of logic there. Could you perhaps offer an example of how we get into either of those states with the executor?
@jjyao you suggested (https://github.com/ray-project/ray/pull/30826#discussion_r1090198139) that
- we simply remove the ray shutdown functionality entirely, because
- ray will automatically shutdown when the process ends, and
- users could call
ray.shutdown()
if they really wanted to.
I think it would be ideal here to hide away any unnecessary ray interactions from the users. So, are there any situations that come to mind in which a user could get into trouble by not realising that there is a ray cluster continuing to run in the background? It feels more like a real drop-in replacement of the other Executor
derivations to have no remaining artefacts after the computation finishes.
Thanks for all the reviews so far 😎
forgot about the PR for a while! I will take a look at it tomorrow!
Hiyo, just wanted to motivate that this PR would be very much appreciated whenever you find yourself with some more free time!
Sorry I was a bit busy recently. I will review it asap. Sorry for the delay!
@rkooo567 any update on this?
Hi @nidabdella,
Could you reply to the comments I posted above? We can also schedule a meeting to go through the design and make sure we are aligned.
Hi @nidabdella,
Could you reply to the comments I posted above? We can also schedule a meeting to go through the design and make sure we are aligned.
Thanks @jjyao, I've reverted to an Actor-based model. In the meantime, I've pinged you on Slack. Let's try and organise a chat to make sure we're on the same page.
I am a bit occupied lately, so I will rely on @jjyao to finish the review!
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
- If you'd like to keep this open, just leave any comment, and the stale label will be removed.
Hi @jeicher, do you have any updates on this?
Hi @jeicher, do you have any updates on this?
Hi @jjyao, thanks for checking. I am on leave at the moment so it will be a while before I get to this again.
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
- If you'd like to keep this open, just leave any comment, and the stale label will be removed.
Hey guys, I'll be back from leave soon. Hoping to get to this in the near future!
Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.
Please feel free to reopen or open a new issue if you'd still like it to be addressed.
Again, you can always ask for help on our discussion forum or Ray's public slack channel.
Thanks again for opening the issue!
Just chiming in to say I love the idea :)
@fkaleo I'll have a finished solution for this very soon 👍🏻