runpod-python icon indicating copy to clipboard operation
runpod-python copied to clipboard

Cannot Connect to Pod's Exposed Public IP & Port from Pod within same Region

Open cblmemo opened this issue 1 year ago • 9 comments

Describe the bug The ports exposed through TCP Public IP cannot be accessed inside pods within same region.

To Reproduce

  1. Use this script to create 2 pods from same region:
import runpod
import base64
import os
from rich import print

def create(name: str, region: str):
    with open(os.path.expanduser('~/.ssh/id_rsa.pub'), 'r', encoding='utf-8') as f:
        public_key = f.read().strip()
    setup_cmd = (
        # Setting up SSH here
        'prefix_cmd() '
        '{ if [ $(id -u) -ne 0 ]; then echo "sudo"; else echo ""; fi; }; '
        '$(prefix_cmd) apt update;'
        'export DEBIAN_FRONTEND=noninteractive;'
        '$(prefix_cmd) apt install openssh-server rsync curl patch -y;'
        '$(prefix_cmd) mkdir -p /var/run/sshd; '
        '$(prefix_cmd) '
        'sed -i "s/PermitRootLogin prohibit-password/PermitRootLogin yes/" '
        '/etc/ssh/sshd_config; '
        '$(prefix_cmd) sed '
        '"s@session\\s*required\\s*pam_loginuid.so@session optional '
        'pam_loginuid.so@g" -i /etc/pam.d/sshd; '
        'cd /etc/ssh/ && $(prefix_cmd) ssh-keygen -A; '
        '$(prefix_cmd) mkdir -p ~/.ssh; '
        '$(prefix_cmd) chown -R $(whoami) ~/.ssh;'
        '$(prefix_cmd) chmod 700 ~/.ssh; '
        f'$(prefix_cmd) echo "{public_key}" >> ~/.ssh/authorized_keys; '
        '$(prefix_cmd) chmod 644 ~/.ssh/authorized_keys; '
        '$(prefix_cmd) service ssh restart; '
        '[ $(id -u) -eq 0 ] && echo alias sudo="" >> ~/.bashrc;'
        # Starting a test HTTP server
        'python3 -m http.server 9000'
    )
    encoded = base64.b64encode(setup_cmd.encode('utf-8')).decode('utf-8')
    pod = runpod.create_pod(
        name=name,
        image_name="runpod/base:0.0.2",
        gpu_type_id="NVIDIA RTX A4000",
        country_code=region,
        ports="22/tcp,9000/tcp",
        support_public_ip=True,
        docker_args=f'bash -c \'echo {encoded} | base64 --decode > init.sh; bash init.sh\''
    )
    return pod['id']

rp1_id = create("rp1", "CA")
rp2_id = create("rp2", "CA")

print(f"rp1_id = '{rp1_id}'")
print(f"rp2_id = '{rp2_id}'")
  1. Use this script to get test commands:
def get_cmd(pod_id: str):
    pod_stat = runpod.get_pod(pod_id)
    runtime = pod_stat.get('runtime') or {}
    ports_info = runtime.get('ports', [])
    if not ports_info:
        raise ValueError(f"Pod {pod_id} is not ready.")
    ssh_cmd = None
    curl_cmd = None
    for p in ports_info:
        if p['isIpPublic']:
            if p['privatePort'] == 22:
                ssh_cmd = f'ssh -i ~/.ssh/id_rsa -p {p["publicPort"]} root@{p["ip"]}'
            if p['privatePort'] == 9000:
                curl_cmd = f'curl http://{p["ip"]}:{p["publicPort"]}'
    assert ssh_cmd is not None and curl_cmd is not None, f"Pod {pod_id} is not ready."
    return ssh_cmd, curl_cmd

# Fill in the pod id retrieved from previous script
rp1_id = 'qi5a6pnu01x2zl'
rp2_id = '3k3hy87mtr2old'

rp1_ssh, rp1_curl = get_cmd(rp1_id)
rp2_ssh, rp2_curl = get_cmd(rp2_id)

print(rp1_curl)
print(rp2_curl)

print(f'{rp1_ssh} {rp2_curl}')
print(f'{rp2_ssh} {rp1_curl}')

Example output:

curl http://69.30.85.69:22145
curl http://69.30.85.69:22186
ssh -i ~/.ssh/id_rsa -p 22144 [email protected] curl http://69.30.85.69:22186
ssh -i ~/.ssh/id_rsa -p 22185 [email protected] curl http://69.30.85.69:22145
  1. Trying to run the 4 commands we get from the script. The first two (from the laptop running runpod api calls) success but the third and the fourth (which doing curl inside the pod) failed.
$ curl http://69.30.85.69:22145
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
...

$ curl http://69.30.85.69:22186
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
...

$ ssh -i ~/.ssh/id_rsa -p 22144 [email protected] curl http://69.30.85.69:22186
The authenticity of host '[69.30.85.69]:22144 ([69.30.85.69]:22144)' can't be established.
ECDSA key fingerprint is SHA256:8wlRef+5KXU62d7TkPvMan6bkdkyUgPxt4qP4WyWFrw.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '[69.30.85.69]:22144' (ECDSA) to the list of known hosts.
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
curl: (7) Failed to connect to 69.30.85.69 port 22186: Connection refused

$ ssh -i ~/.ssh/id_rsa -p 22185 [email protected] curl http://69.30.85.69:22145
The authenticity of host '[69.30.85.69]:22185 ([69.30.85.69]:22185)' can't be established.
ECDSA key fingerprint is SHA256:8wlRef+5KXU62d7TkPvMan6bkdkyUgPxt4qP4WyWFrw.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '[69.30.85.69]:22185' (ECDSA) to the list of known hosts.
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
curl: (7) Failed to connect to 69.30.85.69 port 22145: Connection refused
  1. The same command works well if the two pod is from different region (tested with CA and SE).

Expected behavior The exposed endpoint is accessible from anywhere, including other pods started by runpod.

Screenshots Pls see the console logs before.

Desktop (please complete the following information):

$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 11 (bullseye)
Release:        11
Codename:       bullseye
$ pip show runpod       
Name: runpod
Version: 1.7.0
Summary: 🐍 | Python library for RunPod API and serverless worker SDK.
Home-page: https://runpod.io
Author: RunPod
Author-email: RunPod <[email protected]>, Justin Merrell <[email protected]>
License: MIT License
Location: /home/memory/install/miniconda3/envs/sky/lib/python3.9/site-packages
Requires: aiohttp, aiohttp-retry, backoff, boto3, click, colorama, cryptography, fastapi, inquirerpy, paramiko, prettytable, py-cpuinfo, requests, tomli, tomlkit, tqdm-loggable, urllib3, watchdog
Required-by:

Additional context None

cblmemo avatar Aug 29 '24 23:08 cblmemo

I also need help with connecting to pods. Connecting via "Basic SSH Terminal" works, but "SSH over exposed TCP" doesn't. I checked the ~/.ssh/authorized_keys file on the pod, and it matches the public key corresponding to the private key I'm using while SSHing. The error I receive is

ssh: connect to host 213.173.108.100 port 12157: Connection refused

keyboardAnt avatar Sep 12 '24 15:09 keyboardAnt

I am having the same issue as above.

arthur-d3nt avatar Feb 07 '25 02:02 arthur-d3nt

Same issue here as well!

samuelzxu avatar Feb 16 '25 18:02 samuelzxu

Same issue.

Kick28 avatar Feb 27 '25 12:02 Kick28

same issue any fixed ?

aldrinrayen avatar Mar 22 '25 08:03 aldrinrayen

I encountered the same issue: basic SSH connections worked, but "SSH over exposed TCP" failed with a Connection refused error. My public and private SSH keys were fine, and I tested with multiple PyTorch container images without success.

What solved it for me was restating the pod while overriding the Container Start Command with: bash -c "apt update;apt install -y wget;DEBIAN_FRONTEND=noninteractive apt-get install openssh-server -y;mkdir -p ~/.ssh;cd $_;chmod 700 ~/.ssh;echo YOUR_PUBLIC_KEY > authorized_keys;chmod 700 authorized_keys;service ssh start;sleep infinity"

According to their official tutorial, this step shouldn’t be necessary anymore for their provided template images, but I found it was required in my case.

Hope this helps!

jacopo-minniti avatar Apr 20 '25 09:04 jacopo-minniti

I have the same issue, ssh: connect to host 38.128.233.55 port 14016: Connection refused

lilyzhng avatar Jul 01 '25 17:07 lilyzhng

@lilyzhng Sorry, do you think you can try: https://github.com/justinwlin/Runpod-SSH-Password

I created a small github repo that helps switch it to a password based ssh, maybe easier to do so vs ssh keys :)

I've tested it with a local machine to pod, and pod to pod. Let me know if this helps solve the issue.

justinwlin avatar Jul 01 '25 22:07 justinwlin

same issue. Any updates?

martin3252 avatar Oct 16 '25 01:10 martin3252