skupper icon indicating copy to clipboard operation
skupper copied to clipboard

Statefulset DNS resolution fails when exposing a service

Open albacanete opened this issue 1 year ago • 10 comments

Describe the bug I do not understand how the DNS resolution works between two clusters that execute statefulsets. When using a single cluster, I can access a Pod through its name (deploying a headless svc), but cannot do the same when using Skupper. Am I missing something?

How To Reproduce

  1. Two clusters were created using kubeadm v1.29.5, containerd as CRI and Flannel as CNI.

    Edge cluster

    acanete@rpi42:~$ kubectl get nodes
    NAME    STATUS   ROLES           AGE   VERSION
    agx14   Ready    <none>          19d   v1.29.5
    agx15   Ready    <none>          19d   v1.29.5
    rpi42   Ready    control-plane   19d   v1.29.5
    

    HPC cluster

    acanete@nano1:~$ kubectl get nodes
    NAME          STATUS   ROLES           AGE   VERSION
    nano1         Ready    control-plane   19d   v1.29.5
    workstation   Ready    <none>          19d   v1.29.5
    
  2. Create a namespace with the same name in each cluster.

    Edge cluster

    kubectl create ns compss
    

    HPC cluster

    kubectl create ns compss
    
  3. Deploy Skupper in each namespace. Since I am deploying it on-prem and with private IP addresses, I am using NodePort.

    Edge cluster. 192.168.50.15 is the IP address of the agx14 node.

    skupper init -n compss --ingress nodeport --ingress-host 192.168.50.15
    

    HPC cluster. 192.168.50.61 is the IP address of the workstation node.

    skupper init -n compss --ingress nodeport --ingress-host 192.168.50.61
    
  4. Link namespaces

    Edge cluster

    skupper -n compss token create edge.token 
    

    HPC cluster. The edge.token file was copied to a machine on the HPC cluster.

    skupper -n compss link create edge.token
    

    Output

    acanete@rpi42:~$ skupper -n compss link status
    
    Links created from this site:
    
        There are no links configured or connected
    
    Current links from other sites that are connected:
    
        Incoming link from site 472fdc04-1406-4281-bbf9-81f5e5ad3737 on namespace compss
    
    acanete@nano1:~$ skupper -n compss link status
    
    Links created from this site:
    
        Link link1 is connected
    
    Current links from other sites that are connected:
    
        There are no connected links
    
    
  5. Deploy test applications in both clusters

    The YAML file for the StatefulSet that runs in the edge cluster is

    apiVersion: v1
    kind: Service
    metadata:
      name: compss-matmul-4fc9d6
      namespace: compss
    spec:
      clusterIP: None  # This makes it a headless service
      selector:
        app: compss
        wf_id: compss-matmul-4fc9d6
      ports:
      - name: port-22
        protocol: TCP
        port: 22
        targetPort: ssh-port
    ---
    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
      name: compss-matmul-4fc9d6-worker
      namespace: compss
    spec:
      selector:
        matchLabels:
          app: compss
          wf_id: compss-matmul-4fc9d6
          pod-hostname: worker
      serviceName: compss-matmul-4fc9d6
      replicas: 2
      ordinals: 
        start: 2
      template:
        metadata:
          labels:
            app: compss
            wf_id: compss-matmul-4fc9d6
            pod-hostname: worker
        spec:
          subdomain: compss-matmul-4fc9d6
          dnsConfig:
            searches:
            - compss-matmul-4fc9d6.compss.svc.cluster.local
          containers:
          - name: worker
            image: albabsc/compss-matmul:verge-0.1.8
            command: [ "/usr/sbin/sshd",  "-D" ]
            resources:
              limits:
                memory: 2G
                cpu: 4
            ports:
            - containerPort: 22
              name: ssh-port
    
    

    The YAML file for the StatefulSet that runs in the HPC cluster is

    apiVersion: v1
    kind: Service
    metadata:
      name: compss-matmul-4fc9d6
      namespace: compss
    spec:
      clusterIP: None  # This makes it a headless service
      selector:
        app: compss
        wf_id: compss-matmul-4fc9d6
      ports:
      - name: port-22
        protocol: TCP
        port: 22
        targetPort: ssh-port
    ---
    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
      name: compss-matmul-4fc9d6-worker
      namespace: compss
    spec:
      selector:
        matchLabels:
          app: compss
          wf_id: compss-matmul-4fc9d6
          pod-hostname: worker
      serviceName: compss-matmul-4fc9d6
      replicas: 2
      template:
        metadata:
          labels:
            app: compss
            wf_id: compss-matmul-4fc9d6
            pod-hostname: worker
        spec:
          subdomain: compss-matmul-4fc9d6
          dnsConfig:
            searches:
            - compss-matmul-4fc9d6.compss.svc.cluster.local
          containers:
          - name: worker
            image: albabsc/compss-matmul:verge-0.1.8
            command: [ "/usr/sbin/sshd",  "-D" ]
            resources:
              limits:
                memory: 2G
                cpu: 4
            ports:
            - containerPort: 22
              name: ssh-port
    
    
  6. Ensure connection among pods of the same cluster

    Edge cluster

    acanete@nano1:~$ kubectl -n compss get pods
    NAME                                          READY   STATUS    RESTARTS   AGE
    compss-matmul-4fc9d6-worker-0                 1/1     Running   0          4m19s
    compss-matmul-4fc9d6-worker-1                 1/1     Running   0          4m18s
    skupper-router-6895bb6f95-88hnj               2/2     Running   0          55m
    skupper-service-controller-559ddbdd56-wnsvh   1/1     Running   0          55m
    
    acanete@nano1:~$ kubectl -n compss exec -it compss-matmul-4fc9d6-worker-0 -- bash
    root@compss-matmul-4fc9d6-worker-0:/# ssh compss-matmul-4fc9d6-worker-1
    Welcome to Ubuntu 20.04.6 LTS (GNU/Linux 6.8.0-47-generic x86_64)
    
     * Documentation:  https://help.ubuntu.com
     * Management:     https://landscape.canonical.com
     * Support:        https://ubuntu.com/pro
    
    This system has been minimized by removing packages and content that are
    not required on a system that users do not log into.
    
    To restore this content, you can run the 'unminimize' command.
    Last login: Wed Nov  6 12:33:29 2024 from 10.244.1.160
    root@compss-matmul-4fc9d6-worker-1:~# 
    

    HPC cluster

    acanete@rpi42:~$ kubectl -n compss get pods
    NAME                                          READY   STATUS    RESTARTS   AGE
    compss-matmul-4fc9d6-worker-2                 1/1     Running   0          4m48s
    compss-matmul-4fc9d6-worker-3                 1/1     Running   0          4m47s
    skupper-router-748c487879-gvpxg               2/2     Running   0          56m
    skupper-service-controller-6f69b974bd-grgzc   1/1     Running   0          56m
    
    acanete@rpi42:~$ kubectl -n compss exec -ti compss-matmul-4fc9d6-worker-2 -- bash
    root@compss-matmul-4fc9d6-worker-2:/# ssh compss-matmul-4fc9d6-worker-3
    Welcome to Ubuntu 20.04.6 LTS (GNU/Linux 5.10.192-tegra aarch64)
    
     * Documentation:  https://help.ubuntu.com
     * Management:     https://landscape.canonical.com
     * Support:        https://ubuntu.com/pro
    
    This system has been minimized by removing packages and content that are
    not required on a system that users do not log into.
    
    To restore this content, you can run the 'unminimize' command.
    Last login: Wed Nov  6 12:49:41 2024 from 10.244.2.97
    root@compss-matmul-4fc9d6-worker-3:~# 
    
  7. Expose service with Skupper

    Command executed in the edge cluster

    skupper -n compss expose service compss-matmul-4fc9d6 --port 22 --address compss-matmul-4fc9d6
    

    Check the service is correctly created

    acanete@rpi42:~$ skupper -n compss service status
    Services exposed through Skupper:
    ╰─ compss-matmul-4fc9d6:22 (tcp)
    
    acanete@nano1:~$ skupper -n compss service status
    Services exposed through Skupper:
    ╰─ compss-matmul-4fc9d6:22 (tcp)
    
  8. DNS resolution no longer works

    When trying to ssh between two pods of the same cluster (e.g edge)

    acanete@nano1:~$ kubectl -n compss get pods
    NAME                                          READY   STATUS    RESTARTS   AGE
    compss-matmul-4fc9d6-worker-0                 1/1     Running   0          4m19s
    compss-matmul-4fc9d6-worker-1                 1/1     Running   0          4m18s
    skupper-router-6895bb6f95-88hnj               2/2     Running   0          55m
    skupper-service-controller-559ddbdd56-wnsvh   1/1     Running   0          55m
    
    acanete@nano1:~$ kubectl -n compss exec -it compss-matmul-4fc9d6-worker-0 -- bash
    root@compss-matmul-4fc9d6-worker-0:/# ssh compss-matmul-4fc9d6-worker-1
    ssh: Could not resolve hostname compss-matmul-4fc9d6-worker-1: No address associated with hostname
    root@compss-matmul-4fc9d6-worker-0:/# ssh compss-matmul-4fc9d6-worker-2
    ssh: Could not resolve hostname compss-matmul-4fc9d6-worker-2: No address associated with hostname
    root@compss-matmul-4fc9d6-worker-0:/# 
    

Expected behavior I would like for every Pod of a StatefulSet to be accessed through their Pod names, or to know the name I have to use. And to know if the name is different when a Pod in cluster 1 want to access a Pod in cluster 2.

Environment details

  • Skupper CLI: 1.8.1
  • Skupper Operator (if applicable): none
  • Platform: kubernetes

Additional context Pods have the the following /etc/resolv.conf file

search compss.svc.cluster.local svc.cluster.local cluster.local lan compss-matmul-4fc9d6.compss.svc.cluster.local
nameserver 10.96.0.10
options ndots:5

albacanete avatar Nov 06 '24 12:11 albacanete

Hello Alba,

You're exposing a service, but if your intention is to have access to each pod by name directly, I would recommend you adding the "--headless" flag to the "skupper expose" command, and, instead of exposing the "service", you have to expose the "statefulset" workload.

Thank you,

On Wed, Nov 6, 2024 at 9:56 AM Alba Cañete Garrucho < @.***> wrote:

Describe the bug I do not understand how the DNS resolution works between two clusters that execute statefulsets. When using a single cluster, I can access a Pod through its name (deploying a headless svc), but cannot do the same when using Skupper. Am I missing something?

How To Reproduce

  1. Two clusters were created using kubeadm v1.29.5, containerd as CRI and Flannel as CNI.

Edge cluster

@.***:~$ kubectl get nodes NAME STATUS ROLES AGE VERSION agx14 Ready 19d v1.29.5 agx15 Ready 19d v1.29.5 rpi42 Ready control-plane 19d v1.29.5

HPC cluster

@.***:~$ kubectl get nodes NAME STATUS ROLES AGE VERSION nano1 Ready control-plane 19d v1.29.5 workstation Ready 19d v1.29.5

  1. Create a namespace with the same name in each cluster.

Edge cluster

kubectl create ns compss

HPC cluster

kubectl create ns compss

  1. Deploy Skupper in each namespace. Since I am deploying it on-prem and with private IP addresses, I am using NodePort.

Edge cluster. 192.168.50.15 is the IP address of the agx14 node.

skupper init -n compss --ingress nodeport --ingress-host 192.168.50.15

HPC cluster. 192.168.50.61 is the IP address of the workstation node.

skupper init -n compss --ingress nodeport --ingress-host 192.168.50.61

  1. Link namespaces

Edge cluster

skupper -n compss token create edge.token

HPC cluster. The edge.token file was copied to a machine on the HPC cluster.

skupper -n compss link create edge.token

Output

@.***:~$ skupper -n compss link status

Links created from this site:

   There are no links configured or connected

Current links from other sites that are connected:

   Incoming link from site 472fdc04-1406-4281-bbf9-81f5e5ad3737 on namespace compss

@.***:~$ skupper -n compss link status

Links created from this site:

   Link link1 is connected

Current links from other sites that are connected:

   There are no connected links
  1. Deploy test applications in both clusters

The YAML file for the StatefulSet that runs in the edge cluster is

apiVersion: v1 kind: Service metadata: name: compss-matmul-4fc9d6 namespace: compss spec: clusterIP: None # This makes it a headless service selector: app: compss wf_id: compss-matmul-4fc9d6 ports: - name: port-22 protocol: TCP port: 22 targetPort: ssh-port

apiVersion: apps/v1 kind: StatefulSet metadata: name: compss-matmul-4fc9d6-worker namespace: compss spec: selector: matchLabels: app: compss wf_id: compss-matmul-4fc9d6 pod-hostname: worker serviceName: compss-matmul-4fc9d6 replicas: 2 ordinals: start: 2 template: metadata: labels: app: compss wf_id: compss-matmul-4fc9d6 pod-hostname: worker spec: subdomain: compss-matmul-4fc9d6 dnsConfig: searches: - compss-matmul-4fc9d6.compss.svc.cluster.local containers: - name: worker image: albabsc/compss-matmul:verge-0.1.8 command: [ "/usr/sbin/sshd", "-D" ] resources: limits: memory: 2G cpu: 4 ports: - containerPort: 22 name: ssh-port

The YAML file for the StatefulSet that runs in the HPC cluster is

apiVersion: v1 kind: Service metadata: name: compss-matmul-4fc9d6 namespace: compss spec: clusterIP: None # This makes it a headless service selector: app: compss wf_id: compss-matmul-4fc9d6 ports: - name: port-22 protocol: TCP port: 22 targetPort: ssh-port

apiVersion: apps/v1 kind: StatefulSet metadata: name: compss-matmul-4fc9d6-worker namespace: compss spec: selector: matchLabels: app: compss wf_id: compss-matmul-4fc9d6 pod-hostname: worker serviceName: compss-matmul-4fc9d6 replicas: 2 template: metadata: labels: app: compss wf_id: compss-matmul-4fc9d6 pod-hostname: worker spec: subdomain: compss-matmul-4fc9d6 dnsConfig: searches: - compss-matmul-4fc9d6.compss.svc.cluster.local containers: - name: worker image: albabsc/compss-matmul:verge-0.1.8 command: [ "/usr/sbin/sshd", "-D" ] resources: limits: memory: 2G cpu: 4 ports: - containerPort: 22 name: ssh-port

  1. Ensure connection among pods of the same cluster

Edge cluster

@.***:~$ kubectl -n compss get pods NAME READY STATUS RESTARTS AGE compss-matmul-4fc9d6-worker-0 1/1 Running 0 4m19s compss-matmul-4fc9d6-worker-1 1/1 Running 0 4m18s skupper-router-6895bb6f95-88hnj 2/2 Running 0 55m skupper-service-controller-559ddbdd56-wnsvh 1/1 Running 0 55m

@.:~$ kubectl -n compss exec -it compss-matmul-4fc9d6-worker-0 -- bash @.:/# ssh compss-matmul-4fc9d6-worker-1 Welcome to Ubuntu 20.04.6 LTS (GNU/Linux 6.8.0-47-generic x86_64)

* Documentation:  https://help.ubuntu.com
* Management:     https://landscape.canonical.com
* Support:        https://ubuntu.com/pro

This system has been minimized by removing packages and content that are not required on a system that users do not log into.

To restore this content, you can run the 'unminimize' command. Last login: Wed Nov 6 12:33:29 2024 from 10.244.1.160 @.***:~#

HPC cluster

@.***:~$ kubectl -n compss get pods NAME READY STATUS RESTARTS AGE compss-matmul-4fc9d6-worker-2 1/1 Running 0 4m48s compss-matmul-4fc9d6-worker-3 1/1 Running 0 4m47s skupper-router-748c487879-gvpxg 2/2 Running 0 56m skupper-service-controller-6f69b974bd-grgzc 1/1 Running 0 56m

@.:~$ kubectl -n compss exec -ti compss-matmul-4fc9d6-worker-2 -- bash @.:/# ssh compss-matmul-4fc9d6-worker-3 Welcome to Ubuntu 20.04.6 LTS (GNU/Linux 5.10.192-tegra aarch64)

* Documentation:  https://help.ubuntu.com
* Management:     https://landscape.canonical.com
* Support:        https://ubuntu.com/pro

This system has been minimized by removing packages and content that are not required on a system that users do not log into.

To restore this content, you can run the 'unminimize' command. Last login: Wed Nov 6 12:49:41 2024 from 10.244.2.97 @.***:~#

  1. Expose service with Skupper

Command executed in the edge cluster

skupper -n compss expose service compss-matmul-4fc9d6 --port 22 --address compss-matmul-4fc9d6

Check the service is correctly created

@.***:~$ skupper -n compss service status Services exposed through Skupper: ╰─ compss-matmul-4fc9d6:22 (tcp)

@.***:~$ skupper -n compss service status Services exposed through Skupper: ╰─ compss-matmul-4fc9d6:22 (tcp)

  1. DNS resolution no longer works

When trying to ssh between two pods of the same cluster (e.g edge)

@.***:~$ kubectl -n compss get pods NAME READY STATUS RESTARTS AGE compss-matmul-4fc9d6-worker-0 1/1 Running 0 4m19s compss-matmul-4fc9d6-worker-1 1/1 Running 0 4m18s skupper-router-6895bb6f95-88hnj 2/2 Running 0 55m skupper-service-controller-559ddbdd56-wnsvh 1/1 Running 0 55m

@.:~$ kubectl -n compss exec -it compss-matmul-4fc9d6-worker-0 -- bash @.:/# ssh compss-matmul-4fc9d6-worker-1 ssh: Could not resolve hostname compss-matmul-4fc9d6-worker-1: No address associated with hostname @.:/# ssh compss-matmul-4fc9d6-worker-2 ssh: Could not resolve hostname compss-matmul-4fc9d6-worker-2: No address associated with hostname @.:/#

Expected behavior I would like for every Pod of a StatefulSet to be accessed through their Pod names, or to know the name I have to use. And to know if the name is different when a Pod in cluster 1 want to access a Pod in cluster 2.

Environment details

  • Skupper CLI: 1.8.1
  • Skupper Operator (if applicable): none
  • Platform: kubernetes

Additional context Pods have the the following /etc/resolv.conf file

search compss.svc.cluster.local svc.cluster.local cluster.local lan compss-matmul-4fc9d6.compss.svc.cluster.local nameserver 10.96.0.10 options ndots:5

— Reply to this email directly, view it on GitHub https://github.com/skupperproject/skupper/issues/1772, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABYML4SDQ7QQS2RJUTFB7CTZ7IGXFAVCNFSM6AAAAABRIYZVSWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGYZTQMBRHEZTKOI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

fgiorgetti avatar Nov 06 '24 13:11 fgiorgetti

Hello @fgiorgetti, thanks for the quick answer :)

I have also tried it and have not been able to make it work. What I have done is:

  1. Unexpose the service skupper -n compss unexpose service compss-matmul-4fc9d6 --address compss-matmul-4fc9d6 and check
    acanete@rpi42:~$ skupper -n compss service status
    No services defined
    ``
    
  2. Exposed the statefulset in the edge cluster executing the following command skupper -n compss expose statefulset compss-matmul-4fc9d6-worker --headless --port 22 Now, two new proxy pods and a svc are created
    acanete@rpi42:~$ kubectl -n compss get pods
    NAME                                          READY   STATUS    RESTARTS   AGE
    compss-matmul-4fc9d6-proxy-0                  1/1     Running   0          7m49s
    compss-matmul-4fc9d6-proxy-1                  1/1     Running   0          7m46s
    compss-matmul-4fc9d6-worker-2                 1/1     Running   0          57m
    compss-matmul-4fc9d6-worker-3                 1/1     Running   0          57m
    skupper-router-748c487879-gvpxg               2/2     Running   0          108m
    skupper-service-controller-6f69b974bd-grgzc   1/1     Running   0          108m
    
    acanete@rpi42:~$ kubectl -n compss get svc
    NAME                         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                                          AGE
    compss-matmul-4fc9d6         ClusterIP   None            <none>        22/TCP                                           55m
    compss-matmul-4fc9d6-proxy   ClusterIP   None            <none>        22/TCP                                           5m23s
    skupper-router               NodePort    10.96.210.62    <none>        55671:30581/TCP,45671:30524/TCP,8081:32381/TCP   106m
    skupper-router-local         ClusterIP   10.98.178.162   <none>        5671/TCP                                         106m
    
  3. Tried to connect to a worker in the edge cluster from a worker in the HPC cluster
    acanete@nano1:~$ kubectl -n compss exec -it compss-matmul-4fc9d6-worker-0 -- bash
    root@compss-matmul-4fc9d6-worker-0:/# ssh compss-matmul-4fc9d6-worker-2
    ssh: Could not resolve hostname compss-matmul-4fc9d6-worker-2: No address associated with hostname
    
    Also tried connecting to the newly created proxy pods
    root@compss-matmul-4fc9d6-worker-0:/# ssh compss-matmul-4fc9d6-proxy-0
    ssh: Could not resolve hostname compss-matmul-4fc9d6-proxy-0: No address associated with hostname
    

albacanete avatar Nov 06 '24 13:11 albacanete

Since you're deploying the same statefulset and service on both namespaces, could you try to modify their names in one of the clusters, possibly just changing the suffix in one of them?

This way, skupper will basically create different statefulset proxies and headless services, and will avoid name clashes with the generated resources on each cluster/namespace.

Suppose you modify the suffix in one of your clusters, from 4fc9d6 to 4fc9d7, then you should be able to reach your distinct pods, using the following names:

compss-matmul-4fc9d6-worker-0.compss-matmul-4fc9d6 compss-matmul-4fc9d6-worker-1.compss-matmul-4fc9d6 compss-matmul-4fc9d7-worker-0.compss-matmul-4fc9d7 compss-matmul-4fc9d7-worker-1.compss-matmul-4fc9d7

Basically on the remote namespaces, Skupper will create a statefulset and a headless service that have the same name (from the originally exposed statefulset) on the other cluster/namespace. So if your statefulsets and headless services have the same names on both sides, I believe it won't work as expected.

On Wed, Nov 6, 2024 at 10:29 AM Alba Cañete Garrucho < @.***> wrote:

Hello @fgiorgetti https://github.com/fgiorgetti, thanks for the quick answer :)

I have also tried it and have not been able to make it work. What I have done is:

  1. Unexpose the service skupper -n compss unexpose service compss-matmul-4fc9d6 --address compss-matmul-4fc9d6 and check

@.***:~$ skupper -n compss service status No services defined ``

  1. Exposed the statefulset in the edge cluster executing the following command skupper -n compss expose statefulset compss-matmul-4fc9d6-worker --headless --port 22 Now, two new proxy pods and a svc are created

@.***:~$ kubectl -n compss get pods NAME READY STATUS RESTARTS AGE compss-matmul-4fc9d6-proxy-0 1/1 Running 0 7m49s compss-matmul-4fc9d6-proxy-1 1/1 Running 0 7m46s compss-matmul-4fc9d6-worker-2 1/1 Running 0 57m compss-matmul-4fc9d6-worker-3 1/1 Running 0 57m skupper-router-748c487879-gvpxg 2/2 Running 0 108m skupper-service-controller-6f69b974bd-grgzc 1/1 Running 0 108m

@.***:~$ kubectl -n compss get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE compss-matmul-4fc9d6 ClusterIP None 22/TCP 55m compss-matmul-4fc9d6-proxy ClusterIP None 22/TCP 5m23s skupper-router NodePort 10.96.210.62 55671:30581/TCP,45671:30524/TCP,8081:32381/TCP 106m skupper-router-local ClusterIP 10.98.178.162 5671/TCP 106m

  1. Tried to connect to a worker in the edge cluster from a worker in the HPC cluster

@.:~$ kubectl -n compss exec -it compss-matmul-4fc9d6-worker-0 -- bash @.:/# ssh compss-matmul-4fc9d6-worker-2 ssh: Could not resolve hostname compss-matmul-4fc9d6-worker-2: No address associated with hostname

Also tried connecting to the newly created proxy pods

@.***:/# ssh compss-matmul-4fc9d6-proxy-0 ssh: Could not resolve hostname compss-matmul-4fc9d6-proxy-0: No address associated with hostname

— Reply to this email directly, view it on GitHub https://github.com/skupperproject/skupper/issues/1772#issuecomment-2459755785, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABYML4SPGJ5TVSL67TR2HO3Z7IKS3AVCNFSM6AAAAABRIYZVSWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINJZG42TKNZYGU . You are receiving this because you were mentioned.Message ID: @.***>

fgiorgetti avatar Nov 06 '24 14:11 fgiorgetti

hello @fgiorgetti, I have modified the YAML files and now they are

Edge cluster
apiVersion: v1
kind: Service
metadata:
  name: compss-matmul-4fc9d61
  namespace: compss
spec:
  clusterIP: None  # This makes it a headless service
  selector:
    app: compss
    wf_id: compss-matmul-4fc9d61
  ports:
  - name: port-22
    protocol: TCP
    port: 22
    targetPort: ssh-port
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: compss-matmul-4fc9d61-worker
  namespace: compss
spec:
  selector:
    matchLabels:
      app: compss
      wf_id: compss-matmul-4fc9d61
      pod-hostname: worker
  serviceName: compss-matmul-4fc9d61
  replicas: 2
  ordinals: 
    start: 2
  template:
    metadata:
      labels:
        app: compss
        wf_id: compss-matmul-4fc9d61
        pod-hostname: worker
    spec:
      subdomain: compss-matmul-4fc9d61
      dnsConfig:
        searches:
        - compss-matmul-4fc9d61.compss.svc.cluster.local
      containers:
      - name: worker
        image: albabsc/compss-matmul:verge-0.1.8
        command: [ "/usr/sbin/sshd",  "-D" ]
        resources:
          limits:
            memory: 2G
            cpu: 4
        ports:
        - containerPort: 22
          name: ssh-port
HPC cluster
apiVersion: v1
kind: Service
metadata:
  name: compss-matmul-4fc9d6
  namespace: compss
spec:
  clusterIP: None  # This makes it a headless service
  selector:
    app: compss
    wf_id: compss-matmul-4fc9d6
  ports:
  - name: port-22
    protocol: TCP
    port: 22
    targetPort: ssh-port
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: compss-matmul-4fc9d6-worker
  namespace: compss
spec:
  selector:
    matchLabels:
      app: compss
      wf_id: compss-matmul-4fc9d6
      pod-hostname: worker
  serviceName: compss-matmul-4fc9d6
  replicas: 2
  template:
    metadata:
      labels:
        app: compss
        wf_id: compss-matmul-4fc9d6
        pod-hostname: worker
    spec:
      subdomain: compss-matmul-4fc9d6
      dnsConfig:
        searches:
        - compss-matmul-4fc9d6.compss.svc.cluster.local
      containers:
      - name: worker
        image: albabsc/compss-matmul:verge-0.1.8
        command: [ "/usr/sbin/sshd",  "-D" ]
        resources:
          limits:
            memory: 2G
            cpu: 4
        ports:
        - containerPort: 22
          name: ssh-port

I have deployed both YAMLs and executed the following command on the edge cluster:

acanete@rpi42:~$ skupper -n compss expose statefulset compss-matmul-4fc9d61-worker --headless --port 22

When the statefulset in the edge cluster gets exposed, the new pods appear in the HPC cluster

acanete@nano1:~$ kubectl -n compss get pods
NAME                                          READY   STATUS    RESTARTS   AGE
compss-matmul-4fc9d6-worker-0                 1/1     Running   0          30s
compss-matmul-4fc9d6-worker-1                 1/1     Running   0          29s
compss-matmul-4fc9d61-worker-0                1/1     Running   0          9s
compss-matmul-4fc9d61-worker-1                1/1     Running   0          7s
skupper-router-f88bff6f9-4mskr                2/2     Running   0          98s
skupper-service-controller-655bf9fbf8-8gdln   1/1     Running   0          98s

Now the DNS resolution is ok, but ssh fails to create the connection. Do you know if it has to do with the implementation of Skupper's security? The docker image has the ssh keys inside and I can ssh to pods in the same cluster Pod in a different cluster

acanete@nano1:~$ kubectl -n compss exec -ti compss-matmul-4fc9d6-worker-0 -- bash
root@compss-matmul-4fc9d6-worker-0:/# nslookup compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61
;; Got recursion not available from 10.96.0.10
Server:		10.96.0.10
Address:	10.96.0.10#53

Name:	compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61.compss.svc.cluster.local
Address: 10.244.1.184
;; Got recursion not available from 10.96.0.10

root@compss-matmul-4fc9d6-worker-0:/# ssh compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61         
ssh: connect to host compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61 port 22: Connection refused

Pod in the same cluster

acanete@nano1:~$ kubectl -n compss exec -ti compss-matmul-4fc9d6-worker-0 -- bash
root@compss-matmul-4fc9d6-worker-0:/# ssh compss-matmul-4fc9d6-worker-1
Welcome to Ubuntu 20.04.6 LTS (GNU/Linux 6.8.0-47-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/pro

This system has been minimized by removing packages and content that are
not required on a system that users do not log into.

To restore this content, you can run the 'unminimize' command.
Last login: Sat Nov  9 16:39:10 2024 from 10.244.1.182

albacanete avatar Nov 09 '24 16:11 albacanete

Hello Alba,

Looking at your statefulset, I noticed it has the following specification:

ordinals: start: 2

Do you really need to set the start index for your worker pods?

If you remove it, I believe it should work for you, as the remote proxy pods created by Skupper will have the appropriate names and the local proxy pods (on the same cluster and namespace of your exposed statefulset) will target the correct local pods as well.

Otherwise, the worker pods are created as compss-matmul-4fc9d61-worker-2 and compss-matmul-4fc9d61-worker-3, which is currently not supported as proxy pods won't work properly.

In case you can remove the ordinals.start definition, then you should be able to access your services using: .<service-name>, with service-name being the value of spec.serviceName from your statefulset, i.e:

  • compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61
  • compss-matmul-4fc9d61-worker-1.compss-matmul-4fc9d61

On Sat, Nov 9, 2024 at 1:44 PM Alba Cañete Garrucho < @.***> wrote:

hello @fgiorgetti https://github.com/fgiorgetti, I have modified the YAML files and now they are Edge cluster

apiVersion: v1 kind: Service metadata: name: compss-matmul-4fc9d61 namespace: compss spec: clusterIP: None # This makes it a headless service selector: app: compss wf_id: compss-matmul-4fc9d61 ports:

  • name: port-22 protocol: TCP port: 22 targetPort: ssh-port

apiVersion: apps/v1 kind: StatefulSet metadata: name: compss-matmul-4fc9d61-worker namespace: compss spec: selector: matchLabels: app: compss wf_id: compss-matmul-4fc9d61 pod-hostname: worker serviceName: compss-matmul-4fc9d61 replicas: 2 ordinals: start: 2 template: metadata: labels: app: compss wf_id: compss-matmul-4fc9d61 pod-hostname: worker spec: subdomain: compss-matmul-4fc9d61 dnsConfig: searches: - compss-matmul-4fc9d61.compss.svc.cluster.local containers: - name: worker image: albabsc/compss-matmul:verge-0.1.8 command: [ "/usr/sbin/sshd", "-D" ] resources: limits: memory: 2G cpu: 4 ports: - containerPort: 22 name: ssh-port

HPC cluster

apiVersion: v1 kind: Service metadata: name: compss-matmul-4fc9d6 namespace: compss spec: clusterIP: None # This makes it a headless service selector: app: compss wf_id: compss-matmul-4fc9d6 ports:

  • name: port-22 protocol: TCP port: 22 targetPort: ssh-port

apiVersion: apps/v1 kind: StatefulSet metadata: name: compss-matmul-4fc9d6-worker namespace: compss spec: selector: matchLabels: app: compss wf_id: compss-matmul-4fc9d6 pod-hostname: worker serviceName: compss-matmul-4fc9d6 replicas: 2 template: metadata: labels: app: compss wf_id: compss-matmul-4fc9d6 pod-hostname: worker spec: subdomain: compss-matmul-4fc9d6 dnsConfig: searches: - compss-matmul-4fc9d6.compss.svc.cluster.local containers: - name: worker image: albabsc/compss-matmul:verge-0.1.8 command: [ "/usr/sbin/sshd", "-D" ] resources: limits: memory: 2G cpu: 4 ports: - containerPort: 22 name: ssh-port

I have deployed both YAMLs and executed the following command on the edge cluster:

@.***:~$ skupper -n compss expose statefulset compss-matmul-4fc9d61-worker --headless --port 22

When the statefulset in the edge cluster gets exposed, the new pods appear in the HPC cluster

@.***:~$ kubectl -n compss get pods NAME READY STATUS RESTARTS AGE compss-matmul-4fc9d6-worker-0 1/1 Running 0 30s compss-matmul-4fc9d6-worker-1 1/1 Running 0 29s compss-matmul-4fc9d61-worker-0 1/1 Running 0 9s compss-matmul-4fc9d61-worker-1 1/1 Running 0 7s skupper-router-f88bff6f9-4mskr 2/2 Running 0 98s skupper-service-controller-655bf9fbf8-8gdln 1/1 Running 0 98s

Now the DNS resolution is ok, but ssh fails to create the connection. Do you know if it has to do with the implementation of Skupper's security? The docker image has the ssh keys inside and I can ssh to pods in the same cluster Pod in a different cluster

@.:~$ kubectl -n compss exec -ti compss-matmul-4fc9d6-worker-0 -- bash @.:/# nslookup compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61 ;; Got recursion not available from 10.96.0.10 Server: 10.96.0.10 Address: 10.96.0.10#53

Name: compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61.compss.svc.cluster.local Address: 10.244.1.184 ;; Got recursion not available from 10.96.0.10

@.***:/# ssh compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61 ssh: connect to host compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61 port 22: Connection refused

Pod in the same cluster

@.:~$ kubectl -n compss exec -ti compss-matmul-4fc9d6-worker-0 -- bash @.:/# ssh compss-matmul-4fc9d6-worker-1 Welcome to Ubuntu 20.04.6 LTS (GNU/Linux 6.8.0-47-generic x86_64)

  • Documentation: https://help.ubuntu.com
  • Management: https://landscape.canonical.com
  • Support: https://ubuntu.com/pro

This system has been minimized by removing packages and content that are not required on a system that users do not log into.

To restore this content, you can run the 'unminimize' command. Last login: Sat Nov 9 16:39:10 2024 from 10.244.1.182

— Reply to this email directly, view it on GitHub https://github.com/skupperproject/skupper/issues/1772#issuecomment-2466280182, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABYML4V5XNNMO4BWNTZ4EJ3Z7Y3YBAVCNFSM6AAAAABRIYZVSWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRWGI4DAMJYGI . You are receiving this because you were mentioned.Message ID: @.***>

fgiorgetti avatar Nov 11 '24 16:11 fgiorgetti

Hello @fgiorgetti, I deployed it as you mention but I still get connection refused

acanete@nano1:~$ kubectl -n compss exec -ti compss-matmul-4fc9d6-worker-0 -- bash
root@compss-matmul-4fc9d6-worker-0:/# ssh compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61
ssh: connect to host compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61 port 22: Connection refused
root@compss-matmul-4fc9d6-worker-0:/# ssh compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61 -vvv
OpenSSH_8.9p1 Ubuntu-3ubuntu0.10, OpenSSL 3.0.2 15 Mar 2022
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: include /etc/ssh/ssh_config.d/*.conf matched no files
debug1: /etc/ssh/ssh_config line 21: Applying options for *
debug3: expanded UserKnownHostsFile '~/.ssh/known_hosts' -> '/root/.ssh/known_hosts'
debug3: expanded UserKnownHostsFile '~/.ssh/known_hosts2' -> '/root/.ssh/known_hosts2'
debug2: resolving "compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61" port 22
debug3: resolve_host: lookup compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61:22
debug3: ssh_connect_direct: entering
debug1: Connecting to compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61 [10.244.1.221] port 22.
debug3: set_sock_tos: set socket 3 IP_TOS 0x10
debug1: connect to address 10.244.1.221 port 22: Connection refused
ssh: connect to host compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61 port 22: Connection refused

Pods

acanete@nano1:~$ kubectl -n compss get pods 
NAME                                          READY   STATUS    RESTARTS   AGE
compss-matmul-4fc9d6-worker-0                 1/1     Running   0          17m
compss-matmul-4fc9d6-worker-1                 1/1     Running   0          15m
compss-matmul-4fc9d61-worker-0                1/1     Running   0          7m47s
compss-matmul-4fc9d61-worker-1                1/1     Running   0          7m45s
skupper-router-6d4f86ff78-mws6g               2/2     Running   0          18m
skupper-service-controller-797f97b858-dg4rp   1/1     Running   0          18m

albacanete avatar Nov 19 '24 12:11 albacanete

Hello again @fgiorgetti :)

With further debugging I have realized that the IP of the Pod and the IP resolved by the DNS with Skupper are different: IP of the Pod: 10.244.1.131

root@compss-matmul-4fc9d61-worker-0:/# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0@if95: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default 
    link/ether da:7d:6c:7d:46:e2 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.244.1.131/24 brd 10.244.1.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::d87d:6cff:fe7d:46e2/64 scope link 
       valid_lft forever preferred_lft forever

IP resolved by DNS: 10.244.1.227 with ping:

root@compss-matmul-4fc9d6-worker-0:/# ping compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61
PING compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61.compss.svc.cluster.local (10.244.1.227) 56(84) bytes of data.
64 bytes from compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61.compss.svc.cluster.local (10.244.1.227): icmp_seq=1 ttl=64 time=0.020 ms
64 bytes from compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61.compss.svc.cluster.local (10.244.1.227): icmp_seq=2 ttl=64 time=0.029 ms
64 bytes from compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61.compss.svc.cluster.local (10.244.1.227): icmp_seq=3 ttl=64 time=0.028 ms
64 bytes from compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61.compss.svc.cluster.local (10.244.1.227): icmp_seq=4 ttl=64 time=0.055 ms
64 bytes from compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61.compss.svc.cluster.local (10.244.1.227): icmp_seq=5 ttl=64 time=0.030 ms

with ssh:

root@compss-matmul-4fc9d6-worker-0:/# ssh -vvv compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61
OpenSSH_8.9p1 Ubuntu-3ubuntu0.10, OpenSSL 3.0.2 15 Mar 2022
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: include /etc/ssh/ssh_config.d/*.conf matched no files
debug1: /etc/ssh/ssh_config line 21: Applying options for *
debug3: expanded UserKnownHostsFile '~/.ssh/known_hosts' -> '/root/.ssh/known_hosts'
debug3: expanded UserKnownHostsFile '~/.ssh/known_hosts2' -> '/root/.ssh/known_hosts2'
debug2: resolving "compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61" port 22
debug3: resolve_host: lookup compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61:22
debug3: ssh_connect_direct: entering
debug1: Connecting to compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61 [10.244.1.227] port 22.
debug3: set_sock_tos: set socket 3 IP_TOS 0x10
debug1: connect to address 10.244.1.227 port 22: Connection refused
ssh: connect to host compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61 port 22: Connection refused

albacanete avatar Nov 19 '24 15:11 albacanete

Further debugging, even though I executed the command skupper -n compss expose statefulset compss-matmul-4fc9d61-worker --headless --port 22, when trying to list the services exposed I get nothing...

acanete@rpi42:~$ skupper -n compss service status
No services defined

albacanete avatar Nov 19 '24 15:11 albacanete

Some clusters might have securitycontextconstraints preventing pods from running as root, therefore they won't be able to bind system ports (<1024). I am not sure if that is what you're facing, but make sure the worker pods created by Skupper on the remote cluster do not have any issue binding port 22, for example:

$ kubectl logs compss-matmul-4fc9d61-worker-0 | grep denied | tail -1 2024-11-20 14:12:02.035365 +0000 FLOW_LOG (info) LOG [hlGYm:11628922] BEGIN END parent=hlGYm:0 logSeverity=3 logText=LOG_ROUTER: Listener ingress:22: proactor listener error on 0.0.0.0:22: proton:io (Permission denied - listen on 0.0.0.0:22) sourceFile=/build/src/adaptors/adaptor_listener.c sourceLine=172

This could indicate that the pods created by Skupper are unable to bind port 22.

Anyway, I have made some small modifications to your original statefulset to use port 2222 instead, as a way to ensure the system ports are not the root cause.

https://gist.github.com/fgiorgetti/953722df46088a98b2f5f49d6a22ec93

I have deployed the Statefulset above (basically yours with a custom image) to a local cluster named west. Then I linked the west cluster to a remote cluster I am calling east.

At this point, the statefulset is running on the west cluster and I have not yet exposed it to the Skupper network.

Here is how it looks from the west cluster:

west $ kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES compss-matmul-4fc9d61-worker-0 1/1 Running 0 8m47s 10.244.5.213 minikube compss-matmul-4fc9d61-worker-1 1/1 Running 0 8m23s 10.244.5.214 minikube

west $ kubectl get service compss-matmul-4fc9d61 -o wide NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR compss-matmul-4fc9d61 ClusterIP None 2222/TCP 9m6s app=compss,wf_id=compss-matmul-4fc9d61

Running an SSH client pod on the "west" cluster, where the SSHD worker pods are actually running, I can establish a connection (note that the IP returned is the pod ip and that skupper does not manipulate IPs or DNS):

west $ kubectl run ssh-client -it --image quay.io/fgiorgetti/rhel9-sshd -- bash

@.*** /]# ping compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61 PING compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61.fg1.svc.cluster.local (10.244.5.213) 56(84) bytes of data. 64 bytes from compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61.fg1.svc.cluster.local (10.244.5.213): icmp_seq=1 ttl=64 time=0.050 ms

@.*** /]# ssh -p 2222 @.*** The authenticity of host '[compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61]:2222 ([10.244.5.213]:2222)' can't be established. ED25519 key fingerprint is SHA256:lyyTCcGkE2kYBaaIFUzPVYD1vmT4Si/S7mTUPiNTJAs. This key is not known by any other names Are you sure you want to continue connecting (yes/no/[fingerprint])? yes Warning: Permanently added '[compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61]:2222' (ED25519) to the list of known hosts. @.*** ~]#

Skupper has not been involved so far. Now, let's expose the statefulset running on the west cluster and try to access its worker pods from the remote cluster (east).

west $ skupper expose statefulset compss-matmul-4fc9d61-worker --port 2222 --headless statefulset compss-matmul-4fc9d61-worker exposed as compss-matmul-4fc9d61

Looking at the "east" cluster now:

east $ kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES compss-matmul-4fc9d61-worker-0 1/1 Running 0 24s 172.17.44.224 10.240.0.16 compss-matmul-4fc9d61-worker-1 1/1 Running 0 21s 172.17.59.174 10.240.0.4

east $ kubectl get service compss-matmul-4fc9d61 -o wide NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR compss-matmul-4fc9d61 ClusterIP None 2222/TCP 39s internal.skupper.io/service=compss-matmul-4fc9d61

Now that everything is ready, let me run the ssh-client there. Observe that the IP is correct and that I am able to access the SSH server:

east $ kubectl run ssh-client -it --image quay.io/fgiorgetti/rhel9-sshd -- bash

@.*** /]# ping compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61 PING compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61.fg1.svc.cluster.local (172.17.44.224) 56(84) bytes of data. 64 bytes from compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61.fg1.svc.cluster.local (172.17.44.224): icmp_seq=1 ttl=63 time=0.110 ms

@.*** /]# ssh -p 2222 @.*** The authenticity of host '[compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61]:2222 ([172.17.44.224]:2222)' can't be established. ED25519 key fingerprint is SHA256:lyyTCcGkE2kYBaaIFUzPVYD1vmT4Si/S7mTUPiNTJAs. This key is not known by any other names Are you sure you want to continue connecting (yes/no/[fingerprint])? yes Warning: Permanently added '[compss-matmul-4fc9d61-worker-0.compss-matmul-4fc9d61]:2222' (ED25519) to the list of known hosts. Last login: Wed Nov 20 14:53:37 2024 from 10.244.5.215 @.*** ~]#

Would you be able to try again using the modified YAMLs (with port 2222 instead)?

On Tue, Nov 19, 2024 at 12:57 PM Alba Cañete Garrucho < @.***> wrote:

Further debugging, even though I executed the command skupper -n compss expose statefulset compss-matmul-4fc9d61-worker --headless --port 22, when trying to list the services exposed I get nothing...

@.***:~$ skupper -n compss service status No services defined

— Reply to this email directly, view it on GitHub https://github.com/skupperproject/skupper/issues/1772#issuecomment-2486107089, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABYML4XOPC5OEYP6PWA6AP32BNNWLAVCNFSM6AAAAABRIYZVSWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOBWGEYDOMBYHE . You are receiving this because you were mentioned.Message ID: @.***>

fgiorgetti avatar Nov 20 '24 16:11 fgiorgetti

Hello @fgiorgetti , Thanks for the answer! It was endeed a port problem, it has worked with port 2222. Just for people to know, the error I was getting with port 22 is

2024-11-24 22:04:03.595574 +0000 ROUTER (error) Listener ingress:22: proactor listener error on 0.0.0.0:22: proton:io (Permission denied - listen on 0.0.0.0:22)

which you can get executing kubectl -n compss logs compss-matmul-4fc9d61-worker-0 | grep denied where compss-matmul-4fc9d61-worker-0 is one of the exposed pods.

albacanete avatar Nov 25 '24 11:11 albacanete