ray
ray copied to clipboard
[Core][WIP] allow worker lazily bind to job_id
Why are these changes needed?
This PR enables worker lazy binding to job_ids. This enables us to pre-start job agnostic cpu workers in the follow up PRs. The main changes in this PR:
- change the worker side context (WorkerContext) to allow lazy binding to job_id.
- update the raylet side context (Worker) to allow lazy binding to job_id.
- change worker_pool to prestart workers without id specified.
it also fixed a bug to where the runtime_env_hash is not set when prestart workers.
TBD: add tests
Related issue number
Checks
- [ ] I've signed off every commit(by using the -s flag, i.e.,
git commit -s
) in this PR. - [ ] I've run
scripts/format.sh
to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
- [ ] Unit tests
- [ ] Release tests
- [ ] This PR is not tested :(
Lmk when this is not WIP anymore!
ah this run_on_all_workers is annoying
Sycned offline. We will start with 1 file after job is assigned to workers. After that, I will follow up if there's a solution not to lose logs before jobs are assigned before ray 2.3
most of test failures are around logging. we need to merge https://github.com/ray-project/ray/pull/31772 first
test result looks much more promising after the logging PR. Btw we should also change the logging doc. Maybe we can do it as a follow up (or within this PR). https://docs.ray.io/en/master/ray-observability/ray-logging.html#logging-directory-structure