skypilot
skypilot copied to clipboard
[Onprem] Add ability to specify custom ports for ssh
Currently sky onprem assumes port 22 for ssh access. However, servers may use different ports, ssh services behind a load balancer may use different ports on the same IP and local mode container (#968, #1165) may run on ports other than 22 if the user already has a ssh server running.
Implementing this would require modifying the onprem YAML to be support colon separated IPs like:
cluster:
ips: [my.local.cluster.hostname:22, 3.20.226.96:27015, 3.143.112.6:10000]
name: my-local-cluster
auth:
ssh_user: PLACEHOLDER
ssh_private_key: PLACEHOLDER
python: /usr/bin/python3
If port is not specified, then we should default to port 22.
In this format,
cluster:
ips: [my.local.cluster.hostname:22, 3.20.226.96:27015, 3.143.112.6:10000]
it's not clear what these ports are used for. Before we implement this, can we survey how other systems support SSH ports?
Also, is this related to the "local mode", or is this a nice-to-have for the on-prem mode?
it's not clear what these ports are used for. Before we implement this, can we survey how other systems support SSH ports?
Good point - some systems (e.g. k8s services) use a new yaml field for each port to be exposed on the service:
ports:
- name: http
protocol: TCP
port: 80
IP:PORT
appealed to me because (a) it is concise (b) we have multiple IP addresses and each can have a different port. It's also commonly used in specifying sockets (such as for http). But I agree, we should look at other systems to before finalizing this.
Also, is this related to the "local mode", or is this a nice-to-have for the on-prem mode?
It's both, but largely a necessity for "local mode". If my local machine already has port 22 used by something else, our current implementation won't work.
If my local machine already has port 22 used by something else, our current implementation won't work.
Can we consider having us automatically pick a free port? To alleviate user burden to pick ports.
Yep! We will/should automatically find a port. Once we find that, we will pass the port number to sky onprem using this feature request
The reason this format is underspecified is because it’s only SSH. Port conflict can happen to other Ray ports. Should we have a general port finding solution?
On Sun, Oct 9, 2022 at 17:34 Romil Bhardwaj @.***> wrote:
Yep! We will/should automatically find a port. Once we find that, we will pass the port number to sky onprem using this feature request
— Reply to this email directly, view it on GitHub https://github.com/skypilot-org/skypilot/issues/1210#issuecomment-1272667274, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEQWHXF3K37XDGJ2DQT7BLWCNP7VANCNFSM6AAAAAAQ76NGOE . You are receiving this because you commented.Message ID: @.***>
I'm not sure I fully understand why we need to expose Ray ports for local mode or onprem. In local mode, we should not have conflicts with the Ray port because those are within the container and not exposed on the host. Only the ssh port of the container is exposed on host. This is the same interface (ssh address, key) we have to VMs running on the cloud.
Also, Ray appears to employ some kind of port collision avoidance (at least for the for the GCS server port 6379, tested by running a http server in parallel blocking that port).
Since this is a blocker for local mode, I would be happy to implement this (if @michaelzhiluo is too busy). Are people okay with the IP:PORT
format?
SGTM!
I'm running into the issue where ray up
will hang if I run ssh services behind a port that is not 22. I suspect this is because ray up
uses port 22 to ssh into cluster nodes. If this is the case and ray does not provide any option to change this (I didn't see anything in the ray documentation), then it might be difficult to implement this feature.
Any thoughts?
This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days.
This issue was closed because it has been stalled for 10 days with no activity.
@romilbhardwaj this issue is relevant for K8s backend, right?
Yes, K8s needs custom ssh ports in the cloud_vm_ray_backend. It is fixed in the k8s_cloud branch.
This issue specifically tracks custom ssh ports for the onprem feature, which will no longer be required. This issue can be closed.