helm-charts
helm-charts copied to clipboard
[newrelic-pixie] newrelic-pixie init container not running on arm64
Bug description
newrelic-pixie
chart fails to install to arm64 nodes.
Version of Helm and Kubernetes
Any versions, where the nodes are arm64 type. Tested on AKS, Kubernetes v1.26.6 with node pool template Standard_D2pds_v5
(arm64)
Which chart?
helm search repo newrelic-pixie
NAME CHART VERSION APP VERSION DESCRIPTION
newrelic/newrelic-pixie 2.1.2 2.1.4 A Helm chart for the New Relic Pixie integration.
What happened?
The newrelic-pixie job fails 5 times in quick succession after scheduling to an arm64 node.
Logs for the cluster-registration-wait
container include this message:
exec /bin/sh: exec format error │
What you expected to happen?
Expecting the init container to work with arm64.
How to reproduce it?
Add an arm64 node pool to your cluster. Taint the other node groups. Process per this guide.
Install the New Relic bundle with Pixie enabled.
Anything else we need to know?
This is the container image for the container that's not running on arm64:
Image: gcr.io/pixie-oss/pixie-dev-public/curl:1.0 │
Image ID: gcr.io/pixie-oss/pixie-dev-public/curl@sha256:b57f1d617b3eded350e2f78a5eece0c0839c59f59f1dece39f413f599dc382b1 │
https://issues.newrelic.com/browse/NR-167770
https://new-relic.atlassian.net/browse/NR-167770
The pixie repo seems to use this "multiarch" tagged image.
$ git grep 'pixie-dev-public\/curl' | grep '^k8s'
k8s/cloud/base/ory_auth/kratos/kratos_deployment.yaml: image: gcr.io/pixie-oss/pixie-dev-public/curl:multiarch-7.87.0@sha256:f7f265d5c64eb4463a43a99b6bf773f9e61a50aaa7cefaf564f43e42549a01dd
k8s/devinfra/buildbuddy-executor/values.yaml: image: gcr.io/pixie-oss/pixie-dev-public/curl:multiarch-7.87.0@sha256:f7f265d5c64eb4463a43a99b6bf773f9e61a50aaa7cefaf564f43e42549a01dd
k8s/vizier/base/kelvin_deployment.yaml: image: gcr.io/pixie-oss/pixie-dev-public/curl:multiarch-7.87.0@sha256:f7f265d5c64eb4463a43a99b6bf773f9e61a50aaa7cefaf564f43e42549a01dd
k8s/vizier/base/patch_sentry.yaml: image: gcr.io/pixie-oss/pixie-dev-public/curl:multiarch-7.87.0@sha256:f7f265d5c64eb4463a43a99b6bf773f9e61a50aaa7cefaf564f43e42549a01dd
k8s/vizier/base/query_broker_deployment.yaml: image: gcr.io/pixie-oss/pixie-dev-public/curl:multiarch-7.87.0@sha256:f7f265d5c64eb4463a43a99b6bf773f9e61a50aaa7cefaf564f43e42549a01dd
k8s/vizier/bootstrap/cloud_connector_deployment.yaml: image: gcr.io/pixie-oss/pixie-dev-public/curl:multiarch-7.87.0@sha256:f7f265d5c64eb4463a43a99b6bf773f9e61a50aaa7cefaf564f43e42549a01dd
k8s/vizier/etcd_metadata/base/metadata_deployment.yaml: image: gcr.io/pixie-oss/pixie-dev-public/curl:multiarch-7.87.0@sha256:f7f265d5c64eb4463a43a99b6bf773f9e61a50aaa7cefaf564f43e42549a01dd
k8s/vizier/etcd_metadata/base/metadata_deployment.yaml: image: gcr.io/pixie-oss/pixie-dev-public/curl:multiarch-7.87.0@sha256:f7f265d5c64eb4463a43a99b6bf773f9e61a50aaa7cefaf564f43e42549a01dd
k8s/vizier/pem/base/pem_daemonset.yaml: image: gcr.io/pixie-oss/pixie-dev-public/curl:multiarch-7.87.0@sha256:f7f265d5c64eb4463a43a99b6bf773f9e61a50aaa7cefaf564f43e42549a01dd
k8s/vizier/persistent_metadata/base/metadata_statefulset.yaml: image: gcr.io/pixie-oss/pixie-dev-public/curl:multiarch-7.87.0@sha256:f7f265d5c64eb4463a43a99b6bf773f9e61a50aaa7cefaf564f43e42549a01dd
k8s/vizier/sanitizer/kelvin_deployment.yaml: image: gcr.io/pixie-oss/pixie-dev-public/curl:multiarch-7.87.0@sha256:f7f265d5c64eb4463a43a99b6bf773f9e61a50aaa7cefaf564f43e42549a01dd
My suspicion is that the helm chart may not have the latest changes to pull in the correct image.
I missed that this wasn't the pixie-operator helm chart, but the newrelic-pixie chart. I believe we need to replace this image with the one I mentioned above.
After investigating this more, the curl
image isn't the only one to address. The newrelic/newrelic-pixie-integration repo isn't publishing container images for ARM. I've validated with @maxlemieux's help that if those two things are addressed, that the chart successfully installs.
The newrelic/newrelic-pixie-integration
repo's v2.2.0 release supports ARM builds now. We can now update the helm-chart to use this version and fix the curl issue mentioned above.
The curl container issue seems to be fixed with this update, but the main container (not the init container) now shows the same issue with exec format.
This will be addressed once #1198 is merged and a new nri-bundle release is made. Thanks for all your help through this @maxlemieux!
All attempts at reproducing this issue failed, or not enough information was available to reproduce the issue. Reading the code produces no clues as to why this behavior would occur. If more information appears later, please reopen the issue.