ray
ray copied to clipboard
[Core] many_nodes_actor_test_on_v2.aws_1724742913 failed due to dashboard issue
What happened + What you expected to happen
It is displayed as a transient infra error, but I found the root cause is the job API is failed for some reasons. I suspect there's dashboard issue in this test (could be related to overloaded dashboard)
Ray job 'raysubmit_A1fzwu3jMAQhJ3hC' completed with an error: Unexpected error occurred: RPC Error message: Deadline Exceeded; RPC Error details:
--
If you look at logs, you can see the job actually finished
https://console.anyscale-staging.com/v2/cld_kvedZWag2qA8i5BjxUevf5i7/prj_rw3kzbqsehh4a6klka7j616fgb/jobs/prodjob_s3wznrpbt2fmhmbkv4grezljd9?job-logs-section-tabs=application_logs&job-tab=overview
Versions / Dependencies
master
Reproduction script
n/a
Issue Severity
None
It has been passing for 2 months.
same problem
@jjyao Can you share how to solve it?
@workingloong are you still seeing this error?