k8s-rdma-shared-dev-plugin
k8s-rdma-shared-dev-plugin copied to clipboard
rdma-shared-dp-ds server error when start
kubectl logs rdma-shared-dp-ds-545v6 -nkube-system
/bin/k8s-rdma-shared-dp: line 1: syntax error: unexpected ")"
Ubuntu:22.04 image:ghcr.io/mellanox/k8s-rdma-shared-dev-plugin:latest
Issue Resolution: Container Image Version Error
We encountered and resolved a similar error. While we cannot confirm if it's exactly the same issue, we'd like to share our findings.
Root Cause
The container image specification in this yaml file is missing a version tag. Without an explicit version, it defaults to the latest tag, which causes different versions of the container image (ghcr.io/mellanox/k8s-rdma-shared-dev-plugin) to be pulled each time the system is deployed.
Additionally, the latest tag often contains unstable container images rather than stable releases like v1.5.3.
For example, the following container image:
user@node02:~# sudo crictl images | grep sriov-network-device-plugin ghcr.io/k8snetworkplumbingwg/sriov-network-device-plugin latest b292aa8b6cd71 63.1MB
produces this error: /bin/k8s-rdma-shared-dp: line 1: syntax error: unexpected ")"
Solution
Modify the daemonset.yaml file to specify an explicit version tag for the container image.
You can find available versions and their corresponding SHA256 hashes on the package versions page.
We recommend selecting a version tag that includes both your desired version and CPU architecture, such as v1.5.3-amd64.
We encountered the similar error, too. And we identified that the new Dockerfile leads to this error. Details can bee seen in https://github.com/Mellanox/k8s-rdma-shared-dev-plugin/issues/228.
Here are the solutions we have figured out:
- replace
CMD ["/bin/k8s-rdma-shared-dp"]withENTRYPOINT ["/bin/k8s-rdma-shared-dp"]; - replace Dockerfile with the one in branch "1.5.3"
- unset
Entrypoint: /busybox/shofnvcr.io/nvidia/distroless/go:v3.2.1-dev, which needs help of the image developer.
Any one of the solutions above is enough to solve this issue.