ray icon indicating copy to clipboard operation
ray copied to clipboard

[Core][WIP] allow worker lazily bind to job_id

Open scv119 opened this issue 2 years ago • 2 comments

Why are these changes needed?

This PR enables worker lazy binding to job_ids. This enables us to pre-start job agnostic cpu workers in the follow up PRs. The main changes in this PR:

  1. change the worker side context (WorkerContext) to allow lazy binding to job_id.
  2. update the raylet side context (Worker) to allow lazy binding to job_id.
  3. change worker_pool to prestart workers without id specified.

it also fixed a bug to where the runtime_env_hash is not set when prestart workers.

TBD: add tests

Related issue number

Checks

  • [ ] I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • [ ] I've run scripts/format.sh to lint the changes in this PR.
  • [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
  • [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • [ ] Unit tests
    • [ ] Release tests
    • [ ] This PR is not tested :(

scv119 avatar Dec 05 '22 06:12 scv119

Lmk when this is not WIP anymore!

rkooo567 avatar Dec 06 '22 07:12 rkooo567

ah this run_on_all_workers is annoying

scv119 avatar Dec 06 '22 23:12 scv119

Sycned offline. We will start with 1 file after job is assigned to workers. After that, I will follow up if there's a solution not to lose logs before jobs are assigned before ray 2.3

rkooo567 avatar Jan 18 '23 23:01 rkooo567

most of test failures are around logging. we need to merge https://github.com/ray-project/ray/pull/31772 first

scv119 avatar Jan 19 '23 19:01 scv119

test result looks much more promising after the logging PR. Btw we should also change the logging doc. Maybe we can do it as a follow up (or within this PR). https://docs.ray.io/en/master/ray-observability/ray-logging.html#logging-directory-structure

rkooo567 avatar Jan 20 '23 09:01 rkooo567