ray icon indicating copy to clipboard operation
ray copied to clipboard

redo connection error message

Open mattip opened this issue 2 years ago • 2 comments

Why are these changes needed?

Issue #34094 covers the network connection problems with ray startup. One of the conclusions was to improve the error message reported when the worker cannot connect via ray start --address 10.0.20:1234. It now prints (20 times)

[2023-06-01 10:42:56]  ERROR ray._raylet::Failed to connect to GCS. Please check
`gcs_server.out` for more details.
[2023-06-01 10:42:56]  WARNING ray._private.utils::1402 Unable to connect to GCS (ray head) at 
10.0.0.20:1234. Check that (1) Ray with matching version started successfully at the specified 
address, (2) this node can reach that address, and (3) there is no firewall setting preventing
access.

I am not sure 20 times is needed. That is controlled by ray_constants.NUM_REDIS_GET_RETRIES here, which is called from the script

Related issue number

#34094

Checks

  • [x] I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • [x] I've run scripts/format.sh to lint the changes in this PR.
  • [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
    • [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in doc/source/tune/api/ under the corresponding .rst file.
  • [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • [ ] Unit tests
    • [ ] Release tests
    • [ ] This PR is not tested :(

mattip avatar Jun 01 '23 10:06 mattip

I don't think the test failures are related to this PR. Any thoughts?

mattip avatar Jun 12 '23 06:06 mattip

@jjyao can you help review this quick PR?

richardliaw avatar Jun 15 '23 17:06 richardliaw