helm-charts icon indicating copy to clipboard operation
helm-charts copied to clipboard

[newrelic-pixie] newrelic-pixie init container not running on arm64

Open maxlemieux opened this issue 1 year ago • 8 comments

Bug description

newrelic-pixie chart fails to install to arm64 nodes.

Version of Helm and Kubernetes

Any versions, where the nodes are arm64 type. Tested on AKS, Kubernetes v1.26.6 with node pool template Standard_D2pds_v5 (arm64)

Which chart?

helm search repo newrelic-pixie
NAME                   	CHART VERSION	APP VERSION	DESCRIPTION                                      
newrelic/newrelic-pixie	2.1.2        	2.1.4      	A Helm chart for the New Relic Pixie integration.

What happened?

The newrelic-pixie job fails 5 times in quick succession after scheduling to an arm64 node.

Logs for the cluster-registration-wait container include this message:

exec /bin/sh: exec format error                                                                                                                                                     │

What you expected to happen?

Expecting the init container to work with arm64.

How to reproduce it?

Add an arm64 node pool to your cluster. Taint the other node groups. Process per this guide.

Install the New Relic bundle with Pixie enabled.

Anything else we need to know?

This is the container image for the container that's not running on arm64:

Image:         gcr.io/pixie-oss/pixie-dev-public/curl:1.0                                                                                                                       │
Image ID:      gcr.io/pixie-oss/pixie-dev-public/curl@sha256:b57f1d617b3eded350e2f78a5eece0c0839c59f59f1dece39f413f599dc382b1                                                   │

maxlemieux avatar Oct 04 '23 23:10 maxlemieux

https://issues.newrelic.com/browse/NR-167770

https://new-relic.atlassian.net/browse/NR-167770

The pixie repo seems to use this "multiarch" tagged image.

$ git grep 'pixie-dev-public\/curl' | grep '^k8s'
k8s/cloud/base/ory_auth/kratos/kratos_deployment.yaml:        image: gcr.io/pixie-oss/pixie-dev-public/curl:multiarch-7.87.0@sha256:f7f265d5c64eb4463a43a99b6bf773f9e61a50aaa7cefaf564f43e42549a01dd
k8s/devinfra/buildbuddy-executor/values.yaml:  image: gcr.io/pixie-oss/pixie-dev-public/curl:multiarch-7.87.0@sha256:f7f265d5c64eb4463a43a99b6bf773f9e61a50aaa7cefaf564f43e42549a01dd
k8s/vizier/base/kelvin_deployment.yaml:        image: gcr.io/pixie-oss/pixie-dev-public/curl:multiarch-7.87.0@sha256:f7f265d5c64eb4463a43a99b6bf773f9e61a50aaa7cefaf564f43e42549a01dd
k8s/vizier/base/patch_sentry.yaml:        image: gcr.io/pixie-oss/pixie-dev-public/curl:multiarch-7.87.0@sha256:f7f265d5c64eb4463a43a99b6bf773f9e61a50aaa7cefaf564f43e42549a01dd
k8s/vizier/base/query_broker_deployment.yaml:        image: gcr.io/pixie-oss/pixie-dev-public/curl:multiarch-7.87.0@sha256:f7f265d5c64eb4463a43a99b6bf773f9e61a50aaa7cefaf564f43e42549a01dd
k8s/vizier/bootstrap/cloud_connector_deployment.yaml:        image: gcr.io/pixie-oss/pixie-dev-public/curl:multiarch-7.87.0@sha256:f7f265d5c64eb4463a43a99b6bf773f9e61a50aaa7cefaf564f43e42549a01dd
k8s/vizier/etcd_metadata/base/metadata_deployment.yaml:        image: gcr.io/pixie-oss/pixie-dev-public/curl:multiarch-7.87.0@sha256:f7f265d5c64eb4463a43a99b6bf773f9e61a50aaa7cefaf564f43e42549a01dd
k8s/vizier/etcd_metadata/base/metadata_deployment.yaml:        image: gcr.io/pixie-oss/pixie-dev-public/curl:multiarch-7.87.0@sha256:f7f265d5c64eb4463a43a99b6bf773f9e61a50aaa7cefaf564f43e42549a01dd
k8s/vizier/pem/base/pem_daemonset.yaml:        image: gcr.io/pixie-oss/pixie-dev-public/curl:multiarch-7.87.0@sha256:f7f265d5c64eb4463a43a99b6bf773f9e61a50aaa7cefaf564f43e42549a01dd
k8s/vizier/persistent_metadata/base/metadata_statefulset.yaml:        image: gcr.io/pixie-oss/pixie-dev-public/curl:multiarch-7.87.0@sha256:f7f265d5c64eb4463a43a99b6bf773f9e61a50aaa7cefaf564f43e42549a01dd
k8s/vizier/sanitizer/kelvin_deployment.yaml:        image: gcr.io/pixie-oss/pixie-dev-public/curl:multiarch-7.87.0@sha256:f7f265d5c64eb4463a43a99b6bf773f9e61a50aaa7cefaf564f43e42549a01dd

My suspicion is that the helm chart may not have the latest changes to pull in the correct image.

ddelnano avatar Oct 18 '23 16:10 ddelnano

I missed that this wasn't the pixie-operator helm chart, but the newrelic-pixie chart. I believe we need to replace this image with the one I mentioned above.

ddelnano avatar Oct 18 '23 16:10 ddelnano

After investigating this more, the curl image isn't the only one to address. The newrelic/newrelic-pixie-integration repo isn't publishing container images for ARM. I've validated with @maxlemieux's help that if those two things are addressed, that the chart successfully installs.

ddelnano avatar Oct 19 '23 18:10 ddelnano

The newrelic/newrelic-pixie-integration repo's v2.2.0 release supports ARM builds now. We can now update the helm-chart to use this version and fix the curl issue mentioned above.

ddelnano avatar Nov 28 '23 18:11 ddelnano

The curl container issue seems to be fixed with this update, but the main container (not the init container) now shows the same issue with exec format.

maxlemieux avatar Nov 29 '23 02:11 maxlemieux

This will be addressed once #1198 is merged and a new nri-bundle release is made. Thanks for all your help through this @maxlemieux!

ddelnano avatar Nov 30 '23 17:11 ddelnano

All attempts at reproducing this issue failed, or not enough information was available to reproduce the issue. Reading the code produces no clues as to why this behavior would occur. If more information appears later, please reopen the issue.