ray icon indicating copy to clipboard operation
ray copied to clipboard

[VM launcher] Ran `Ray status` after I sshed in to the head node and it printed "No cluster status"

Open scottsun94 opened this issue 2 years ago • 3 comments

What happened + What you expected to happen

The head node is started.

Local node IP: 172.31.62.187

--------------------
Ray runtime started.
--------------------

Next steps
  To add another node to this Ray cluster, run
    ray start --address='172.31.62.187:6379'

  To connect to this Ray cluster:
    import ray
    ray.init()

  To submit a Ray job using the Ray Jobs CLI:
    RAY_ADDRESS='http://127.0.0.1:8265' ray job submit --working-dir . -- python my_script.py

  See https://docs.ray.io/en/latest/cluster/running-applications/job-submission/index.html
  for more information on submitting Ray jobs to the Ray cluster.

  To terminate the Ray runtime, run
    ray stop

  To view the status of the cluster, use
    ray status

  To monitor and debug Ray, view the dashboard at
    127.0.0.1:8265

  If connection to the dashboard fails, check your firewall settings and network configuration.
Shared connection to 34.223.114.236 closed.
  New status: up-to-date

I ran Ray status after I sshed in to the head node and it printed "No cluster status".

Last login: Wed May  3 21:47:54 2023 from {my laptop ip}
ubuntu@ip-172-31-62-187:~$ ray status
No cluster status.
ubuntu@ip-172-31-62-187:~$ exit

The yaml file is attached below.

cluster_name: 0503-3
max_workers: 2
provider:
    type: aws
    region: us-west-2
    cache_stopped_nodes: True
auth:
    ssh_user: ubuntu
available_node_types:
    ray.head.default:
        node_config:
            InstanceType: m5.2xlarge
    ray.worker.default:
        min_workers: 2
        max_workers: 2
        node_config:
            InstanceType: m5.2xlarge
head_node_type: ray.head.default
head_start_ray_commands:
    - ray stop
    - ray start --head --port=6379 --object-manager-port=8076 --autoscaling-config=~/ray_bootstrap_config.yaml --temp-dir=~/ray_temp_logs/
worker_start_ray_commands:
    - ray stop
    - ray start --address=$RAY_HEAD_IP:6379 --object-manager-port=8076

Versions / Dependencies

see yaml file above

Reproduction script

see yaml file above

Issue Severity

High: It blocks me from completing my task.

scottsun94 avatar May 03 '23 22:05 scottsun94

cc: @gvspraveen @wuisawesome

scottsun94 avatar May 03 '23 22:05 scottsun94

In the absence of error messages i'm assuming this is a race condition where that ray status is happening before the autoscaler is fully up.

I assume this should get fixed in the autoscaler refactor? @scv119

wuisawesome avatar May 04 '23 23:05 wuisawesome

This P2 issue has seen no activity in the past 2 years. It will be closed in 2 weeks as part of ongoing cleanup efforts.

Please comment and remove the pending-cleanup label if you believe this issue should remain open.

Thanks for contributing to Ray!

cszhu avatar Jun 17 '25 00:06 cszhu