Cannot Connect to Pod's Exposed Public IP & Port from Pod within same Region
Describe the bug The ports exposed through TCP Public IP cannot be accessed inside pods within same region.
To Reproduce
- Use this script to create 2 pods from same region:
import runpod
import base64
import os
from rich import print
def create(name: str, region: str):
with open(os.path.expanduser('~/.ssh/id_rsa.pub'), 'r', encoding='utf-8') as f:
public_key = f.read().strip()
setup_cmd = (
# Setting up SSH here
'prefix_cmd() '
'{ if [ $(id -u) -ne 0 ]; then echo "sudo"; else echo ""; fi; }; '
'$(prefix_cmd) apt update;'
'export DEBIAN_FRONTEND=noninteractive;'
'$(prefix_cmd) apt install openssh-server rsync curl patch -y;'
'$(prefix_cmd) mkdir -p /var/run/sshd; '
'$(prefix_cmd) '
'sed -i "s/PermitRootLogin prohibit-password/PermitRootLogin yes/" '
'/etc/ssh/sshd_config; '
'$(prefix_cmd) sed '
'"s@session\\s*required\\s*pam_loginuid.so@session optional '
'pam_loginuid.so@g" -i /etc/pam.d/sshd; '
'cd /etc/ssh/ && $(prefix_cmd) ssh-keygen -A; '
'$(prefix_cmd) mkdir -p ~/.ssh; '
'$(prefix_cmd) chown -R $(whoami) ~/.ssh;'
'$(prefix_cmd) chmod 700 ~/.ssh; '
f'$(prefix_cmd) echo "{public_key}" >> ~/.ssh/authorized_keys; '
'$(prefix_cmd) chmod 644 ~/.ssh/authorized_keys; '
'$(prefix_cmd) service ssh restart; '
'[ $(id -u) -eq 0 ] && echo alias sudo="" >> ~/.bashrc;'
# Starting a test HTTP server
'python3 -m http.server 9000'
)
encoded = base64.b64encode(setup_cmd.encode('utf-8')).decode('utf-8')
pod = runpod.create_pod(
name=name,
image_name="runpod/base:0.0.2",
gpu_type_id="NVIDIA RTX A4000",
country_code=region,
ports="22/tcp,9000/tcp",
support_public_ip=True,
docker_args=f'bash -c \'echo {encoded} | base64 --decode > init.sh; bash init.sh\''
)
return pod['id']
rp1_id = create("rp1", "CA")
rp2_id = create("rp2", "CA")
print(f"rp1_id = '{rp1_id}'")
print(f"rp2_id = '{rp2_id}'")
- Use this script to get test commands:
def get_cmd(pod_id: str):
pod_stat = runpod.get_pod(pod_id)
runtime = pod_stat.get('runtime') or {}
ports_info = runtime.get('ports', [])
if not ports_info:
raise ValueError(f"Pod {pod_id} is not ready.")
ssh_cmd = None
curl_cmd = None
for p in ports_info:
if p['isIpPublic']:
if p['privatePort'] == 22:
ssh_cmd = f'ssh -i ~/.ssh/id_rsa -p {p["publicPort"]} root@{p["ip"]}'
if p['privatePort'] == 9000:
curl_cmd = f'curl http://{p["ip"]}:{p["publicPort"]}'
assert ssh_cmd is not None and curl_cmd is not None, f"Pod {pod_id} is not ready."
return ssh_cmd, curl_cmd
# Fill in the pod id retrieved from previous script
rp1_id = 'qi5a6pnu01x2zl'
rp2_id = '3k3hy87mtr2old'
rp1_ssh, rp1_curl = get_cmd(rp1_id)
rp2_ssh, rp2_curl = get_cmd(rp2_id)
print(rp1_curl)
print(rp2_curl)
print(f'{rp1_ssh} {rp2_curl}')
print(f'{rp2_ssh} {rp1_curl}')
Example output:
curl http://69.30.85.69:22145
curl http://69.30.85.69:22186
ssh -i ~/.ssh/id_rsa -p 22144 [email protected] curl http://69.30.85.69:22186
ssh -i ~/.ssh/id_rsa -p 22185 [email protected] curl http://69.30.85.69:22145
- Trying to run the 4 commands we get from the script. The first two (from the laptop running runpod api calls) success but the third and the fourth (which doing curl inside the pod) failed.
$ curl http://69.30.85.69:22145
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
...
$ curl http://69.30.85.69:22186
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
...
$ ssh -i ~/.ssh/id_rsa -p 22144 [email protected] curl http://69.30.85.69:22186
The authenticity of host '[69.30.85.69]:22144 ([69.30.85.69]:22144)' can't be established.
ECDSA key fingerprint is SHA256:8wlRef+5KXU62d7TkPvMan6bkdkyUgPxt4qP4WyWFrw.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '[69.30.85.69]:22144' (ECDSA) to the list of known hosts.
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
curl: (7) Failed to connect to 69.30.85.69 port 22186: Connection refused
$ ssh -i ~/.ssh/id_rsa -p 22185 [email protected] curl http://69.30.85.69:22145
The authenticity of host '[69.30.85.69]:22185 ([69.30.85.69]:22185)' can't be established.
ECDSA key fingerprint is SHA256:8wlRef+5KXU62d7TkPvMan6bkdkyUgPxt4qP4WyWFrw.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '[69.30.85.69]:22185' (ECDSA) to the list of known hosts.
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
curl: (7) Failed to connect to 69.30.85.69 port 22145: Connection refused
- The same command works well if the two pod is from different region (tested with CA and SE).
Expected behavior The exposed endpoint is accessible from anywhere, including other pods started by runpod.
Screenshots Pls see the console logs before.
Desktop (please complete the following information):
$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 11 (bullseye)
Release: 11
Codename: bullseye
$ pip show runpod
Name: runpod
Version: 1.7.0
Summary: 🐍 | Python library for RunPod API and serverless worker SDK.
Home-page: https://runpod.io
Author: RunPod
Author-email: RunPod <[email protected]>, Justin Merrell <[email protected]>
License: MIT License
Location: /home/memory/install/miniconda3/envs/sky/lib/python3.9/site-packages
Requires: aiohttp, aiohttp-retry, backoff, boto3, click, colorama, cryptography, fastapi, inquirerpy, paramiko, prettytable, py-cpuinfo, requests, tomli, tomlkit, tqdm-loggable, urllib3, watchdog
Required-by:
Additional context None
I also need help with connecting to pods. Connecting via "Basic SSH Terminal" works, but "SSH over exposed TCP" doesn't. I checked the ~/.ssh/authorized_keys file on the pod, and it matches the public key corresponding to the private key I'm using while SSHing. The error I receive is
ssh: connect to host 213.173.108.100 port 12157: Connection refused
I am having the same issue as above.
Same issue here as well!
Same issue.
same issue any fixed ?
I encountered the same issue: basic SSH connections worked, but "SSH over exposed TCP" failed with a Connection refused error. My public and private SSH keys were fine, and I tested with multiple PyTorch container images without success.
What solved it for me was restating the pod while overriding the Container Start Command with:
bash -c "apt update;apt install -y wget;DEBIAN_FRONTEND=noninteractive apt-get install openssh-server -y;mkdir -p ~/.ssh;cd $_;chmod 700 ~/.ssh;echo YOUR_PUBLIC_KEY > authorized_keys;chmod 700 authorized_keys;service ssh start;sleep infinity"
According to their official tutorial, this step shouldn’t be necessary anymore for their provided template images, but I found it was required in my case.
Hope this helps!
I have the same issue, ssh: connect to host 38.128.233.55 port 14016: Connection refused
@lilyzhng Sorry, do you think you can try: https://github.com/justinwlin/Runpod-SSH-Password
I created a small github repo that helps switch it to a password based ssh, maybe easier to do so vs ssh keys :)
I've tested it with a local machine to pod, and pod to pod. Let me know if this helps solve the issue.
same issue. Any updates?