k8s-rdma-shared-dev-plugin icon indicating copy to clipboard operation
k8s-rdma-shared-dev-plugin copied to clipboard

rdma-shared-dp-ds server error when start

Open leogoing opened this issue 1 month ago • 1 comments

kubectl logs rdma-shared-dp-ds-545v6 -nkube-system

/bin/k8s-rdma-shared-dp: line 1: syntax error: unexpected ")"

leogoing avatar Oct 15 '25 08:10 leogoing

Ubuntu:22.04 image:ghcr.io/mellanox/k8s-rdma-shared-dev-plugin:latest

leogoing avatar Oct 15 '25 08:10 leogoing

Issue Resolution: Container Image Version Error

We encountered and resolved a similar error. While we cannot confirm if it's exactly the same issue, we'd like to share our findings.

Root Cause

The container image specification in this yaml file is missing a version tag. Without an explicit version, it defaults to the latest tag, which causes different versions of the container image (ghcr.io/mellanox/k8s-rdma-shared-dev-plugin) to be pulled each time the system is deployed.

Additionally, the latest tag often contains unstable container images rather than stable releases like v1.5.3.

For example, the following container image:

user@node02:~# sudo crictl images | grep sriov-network-device-plugin ghcr.io/k8snetworkplumbingwg/sriov-network-device-plugin latest b292aa8b6cd71 63.1MB

produces this error: /bin/k8s-rdma-shared-dp: line 1: syntax error: unexpected ")"

Solution

Modify the daemonset.yaml file to specify an explicit version tag for the container image.

You can find available versions and their corresponding SHA256 hashes on the package versions page.

We recommend selecting a version tag that includes both your desired version and CPU architecture, such as v1.5.3-amd64.

tx-y-hiraka avatar Nov 10 '25 09:11 tx-y-hiraka

We encountered the similar error, too. And we identified that the new Dockerfile leads to this error. Details can bee seen in https://github.com/Mellanox/k8s-rdma-shared-dev-plugin/issues/228.

Here are the solutions we have figured out:

  1. replace CMD ["/bin/k8s-rdma-shared-dp"] with ENTRYPOINT ["/bin/k8s-rdma-shared-dp"];
  2. replace Dockerfile with the one in branch "1.5.3"
  3. unset Entrypoint: /busybox/sh of nvcr.io/nvidia/distroless/go:v3.2.1-dev, which needs help of the image developer.

Any one of the solutions above is enough to solve this issue.

fsqHub avatar Nov 17 '25 08:11 fsqHub