Chen Shen
Chen Shen
@liuyang-my the Java test failed but i'm not quite sure what exactly happened reading the logs. Do you know what might have gone wrong? (presumably we are hitting some deadlock...
thanks for reviewing! kicking off benchmark here: https://buildkite.com/ray-project/release-tests-pr/builds/24753
No external behavior change, the only thing changes is we delay the job_config binding until it receives the first task_spec.
run single_node microbenchmark with some dummy runtime env: (full script https://gist.github.com/scv119/dc5ff45f9172f4ac4e548cc2c57bc460) With this [PR](https://console.anyscale-staging.com/o/anyscale-internal/projects/prj_qC3ZfndQWYYjx2cz8KWGNUL4/clusters/ses_wih7wv9yg61wp3g7354yhqldci?command-history-section=command_history): ``` Many args time: 17.709704568999996 (10000 args) Many returns time: 6.086816375999888 (3000 returns) Ray.get time: 25.223439912999993...
I'll disable java test for now. @MisterLin1995 will fix the java test in the follow up PRs #31590 #31593
@iycheng for initial triaging.
thanks for detailed investigation. I do feel this regression is tolerable since it's all memory access. Also am I understand correct comparing to the added time query owner information; the...
>1. Ray is running and the master failed => GCS won't be back. (This can be improved in the future with more work). > 2. Start a new Ray cluster,...
> Do we need extra context at all? Why don't we just combine it into the existing context obj? yeah l think we should do them incrementally
Will continue this after Ray 2.3.