skypilot icon indicating copy to clipboard operation
skypilot copied to clipboard

[Onprem] Add ability to specify custom ports for ssh

Open romilbhardwaj opened this issue 2 years ago • 6 comments

Currently sky onprem assumes port 22 for ssh access. However, servers may use different ports, ssh services behind a load balancer may use different ports on the same IP and local mode container (#968, #1165) may run on ports other than 22 if the user already has a ssh server running.

Implementing this would require modifying the onprem YAML to be support colon separated IPs like:

cluster:
  ips: [my.local.cluster.hostname:22, 3.20.226.96:27015, 3.143.112.6:10000]
  name: my-local-cluster

auth:
  ssh_user: PLACEHOLDER
  ssh_private_key: PLACEHOLDER

python: /usr/bin/python3

If port is not specified, then we should default to port 22.

romilbhardwaj avatar Oct 07 '22 22:10 romilbhardwaj

In this format,

cluster:
  ips: [my.local.cluster.hostname:22, 3.20.226.96:27015, 3.143.112.6:10000]

it's not clear what these ports are used for. Before we implement this, can we survey how other systems support SSH ports?

Also, is this related to the "local mode", or is this a nice-to-have for the on-prem mode?

concretevitamin avatar Oct 09 '22 15:10 concretevitamin

it's not clear what these ports are used for. Before we implement this, can we survey how other systems support SSH ports?

Good point - some systems (e.g. k8s services) use a new yaml field for each port to be exposed on the service:

ports:
  - name: http
    protocol: TCP
    port: 80

IP:PORT appealed to me because (a) it is concise (b) we have multiple IP addresses and each can have a different port. It's also commonly used in specifying sockets (such as for http). But I agree, we should look at other systems to before finalizing this.

Also, is this related to the "local mode", or is this a nice-to-have for the on-prem mode?

It's both, but largely a necessity for "local mode". If my local machine already has port 22 used by something else, our current implementation won't work.

romilbhardwaj avatar Oct 09 '22 16:10 romilbhardwaj

If my local machine already has port 22 used by something else, our current implementation won't work.

Can we consider having us automatically pick a free port? To alleviate user burden to pick ports.

concretevitamin avatar Oct 10 '22 00:10 concretevitamin

Yep! We will/should automatically find a port. Once we find that, we will pass the port number to sky onprem using this feature request

romilbhardwaj avatar Oct 10 '22 00:10 romilbhardwaj

The reason this format is underspecified is because it’s only SSH. Port conflict can happen to other Ray ports. Should we have a general port finding solution?

On Sun, Oct 9, 2022 at 17:34 Romil Bhardwaj @.***> wrote:

Yep! We will/should automatically find a port. Once we find that, we will pass the port number to sky onprem using this feature request

— Reply to this email directly, view it on GitHub https://github.com/skypilot-org/skypilot/issues/1210#issuecomment-1272667274, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEQWHXF3K37XDGJ2DQT7BLWCNP7VANCNFSM6AAAAAAQ76NGOE . You are receiving this because you commented.Message ID: @.***>

concretevitamin avatar Oct 10 '22 00:10 concretevitamin

I'm not sure I fully understand why we need to expose Ray ports for local mode or onprem. In local mode, we should not have conflicts with the Ray port because those are within the container and not exposed on the host. Only the ssh port of the container is exposed on host. This is the same interface (ssh address, key) we have to VMs running on the cloud.

Also, Ray appears to employ some kind of port collision avoidance (at least for the for the GCS server port 6379, tested by running a http server in parallel blocking that port).

romilbhardwaj avatar Oct 10 '22 01:10 romilbhardwaj

Since this is a blocker for local mode, I would be happy to implement this (if @michaelzhiluo is too busy). Are people okay with the IP:PORT format?

ewzeng avatar Oct 21 '22 00:10 ewzeng

SGTM!

michaelzhiluo avatar Oct 21 '22 04:10 michaelzhiluo

I'm running into the issue where ray up will hang if I run ssh services behind a port that is not 22. I suspect this is because ray up uses port 22 to ssh into cluster nodes. If this is the case and ray does not provide any option to change this (I didn't see anything in the ray documentation), then it might be difficult to implement this feature.

Any thoughts?

ewzeng avatar Oct 25 '22 17:10 ewzeng

This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions[bot] avatar May 27 '23 02:05 github-actions[bot]

This issue was closed because it has been stalled for 10 days with no activity.

github-actions[bot] avatar Jun 06 '23 02:06 github-actions[bot]

@romilbhardwaj this issue is relevant for K8s backend, right?

gilv avatar Jun 11 '23 14:06 gilv

Yes, K8s needs custom ssh ports in the cloud_vm_ray_backend. It is fixed in the k8s_cloud branch.

This issue specifically tracks custom ssh ports for the onprem feature, which will no longer be required. This issue can be closed.

romilbhardwaj avatar Jun 11 '23 15:06 romilbhardwaj