pixie
pixie copied to clipboard
Support ARM CPUs
Describe the bug A clear and concise description of what the bug is.
To Reproduce Steps to reproduce the behavior:
- Follow this tutorial to deploy Microk8s on a pi cluster. I used the channel for Microk8s 1.19
- enable the following features in microk8s,
addons:
enabled:
dashboard # The Kubernetes dashboard
dns # CoreDNS
ha-cluster # Configure high availability on the current node
host-access # Allow Pods connecting to Host services smoothly
ingress # Ingress controller for external access
metrics-server # K8s Metrics Server for API access to service metrics
rbac # Role-Based Access Control for authorisation
registry # Private image registry exposed on localhost:32000
storage # Storage class; allocates storage from host directory
- Create an account with cluster admin privilege using certificates.
- Create a kubeconfig on an external instance of kubectl using that cluster admin account
- Install Pixie alongside kubectl
- Deploy the Pixie Demo
- Deploy Pixie
- Pods fail in various states. Log output for each pod below.
Expected behavior successful deployment
Logs
boboysdadda@DESKTOP-US92ARK:~$ px deploy
Pixie CLI
Running Cluster Checks:
✔ Kernel version > 4.14.0
✔ Cluster type is supported
✔ K8s version > 1.12.0
✔ Kubectl > 1.10.0 is present
✔ User can create namespace
Installing version: 0.5.2
Generating YAMLs for Pixie
Deploying Pixie to the following cluster: microk8s
Is the cluster correct? (y/n) [y] : y
Found 5 nodes
✔ Creating namespace
✔ Deleting stale Pixie objects, if any
✔ Deploying secrets and configmaps
✔ Deploying Cloud Connector
⠼ Waiting for Cloud Connector to come online
[0142] FATAL Timed out waiting for cluster ID assignment
boboysdadda@DESKTOP-US92ARK:~$ kubectl get pods -n pl
NAME READY STATUS RESTARTS AGE
vizier-cloud-connector-5696d4d66b-2td4h 0/1 ContainerCreating 0 20h
cert-provisioner-job-kjw6n 0/1 Error 0 20h
cert-provisioner-job-trgcv 0/1 Error 0 20h
boboysdadda@DESKTOP-US92ARK:~$ kubectl describe pod cert-provisioner-job-kjw6n -n pl
Name: cert-provisioner-job-kjw6n
Namespace: pl
Priority: 0
Node: pi4-k8s-node4/192.168.2.14
Start Time: Tue, 06 Oct 2020 21:53:08 -0600
Labels: app=pl-monitoring
component=vizier
controller-uid=32a94f06-a0fb-4db7-889d-98ea8636cfa4
job-name=cert-provisioner-job
vizier-bootstrap=true
Annotations: cni.projectcalico.org/podIP: 10.1.217.16/32
cni.projectcalico.org/podIPs: 10.1.217.16/32
Status: Failed
IP: 10.1.217.16
IPs:
IP: 10.1.217.16
Controlled By: Job/cert-provisioner-job
Containers:
provisioner:
Container ID: containerd://19bda16f93f349759f9d54878904d2d1eb0003d68b244bc8f393a60270fa7545
Image: gcr.io/pixie-prod/vizier/cert_provisioner_image:0.5.2
Image ID: gcr.io/pixie-prod/vizier/cert_provisioner_image@sha256:e76e704b00259fe4f8bee6b8761b3676dc57a3119f153a1f980ca390f2387a9b
Port: <none>
Host Port: <none>
State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 06 Oct 2020 21:53:11 -0600
Finished: Tue, 06 Oct 2020 21:53:11 -0600
Ready: False
Restart Count: 0
Environment Variables from:
pl-cloud-config ConfigMap Optional: false
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from updater-service-account-token-4d7gj (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
updater-service-account-token-4d7gj:
Type: Secret (a volume populated by a Secret)
SecretName: updater-service-account-token-4d7gj
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 20h default-scheduler Successfully assigned pl/cert-provisioner-job-kjw6n to pi4-k8s-node4
Normal Pulled 20h kubelet, pi4-k8s-node4 Container image "gcr.io/pixie-prod/vizier/cert_provisioner_image:0.5.2" already present on machine
Normal Created 20h kubelet, pi4-k8s-node4 Created container provisioner
Normal Started 20h kubelet, pi4-k8s-node4 Started container provisioner
Name: cert-provisioner-job-trgcv
Namespace: pl
Priority: 0
Node: pi4-k8s-node4/192.168.2.14
Start Time: Tue, 06 Oct 2020 21:53:12 -0600
Labels: app=pl-monitoring
component=vizier
controller-uid=32a94f06-a0fb-4db7-889d-98ea8636cfa4
job-name=cert-provisioner-job
vizier-bootstrap=true
Annotations: cni.projectcalico.org/podIP: 10.1.217.17/32
cni.projectcalico.org/podIPs: 10.1.217.17/32
Status: Failed
IP: 10.1.217.17
IPs:
IP: 10.1.217.17
Controlled By: Job/cert-provisioner-job
Containers:
provisioner:
Container ID: containerd://07ec2b65d5f38a892341e639b082fb6968ccee51d1c64c3085e228ca034c1f71
Image: gcr.io/pixie-prod/vizier/cert_provisioner_image:0.5.2
Image ID: gcr.io/pixie-prod/vizier/cert_provisioner_image@sha256:e76e704b00259fe4f8bee6b8761b3676dc57a3119f153a1f980ca390f2387a9b
Port: <none>
Host Port: <none>
State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 06 Oct 2020 21:53:14 -0600
Finished: Tue, 06 Oct 2020 21:53:14 -0600
Ready: False
Restart Count: 0
Environment Variables from:
pl-cloud-config ConfigMap Optional: false
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from updater-service-account-token-4d7gj (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
updater-service-account-token-4d7gj:
Type: Secret (a volume populated by a Secret)
SecretName: updater-service-account-token-4d7gj
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 20h default-scheduler Successfully assigned pl/cert-provisioner-job-trgcv to pi4-k8s-node4
Normal Pulled 20h kubelet, pi4-k8s-node4 Container image "gcr.io/pixie-prod/vizier/cert_provisioner_image:0.5.2" already present on machine
Normal Created 20h kubelet, pi4-k8s-node4 Created container provisioner
Normal Started 20h kubelet, pi4-k8s-node4 Started container provisioner
boboysdadda@DESKTOP-US92ARK:~$ kubectl describe pod -n pl vizier-cloud-connector-5696d4d66b-2td4h
Name: vizier-cloud-connector-5696d4d66b-2td4h
Namespace: pl
Priority: 0
Node: pi4-k8s-node4/192.168.2.14
Start Time: Tue, 06 Oct 2020 21:53:08 -0600
Labels: app=pl-monitoring
component=vizier
name=vizier-cloud-connector
plane=control
pod-template-hash=5696d4d66b
vizier-bootstrap=true
Annotations: fluentbit.io/parser: logfmt
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/vizier-cloud-connector-5696d4d66b
Containers:
app:
Container ID:
Image: gcr.io/pixie-prod/vizier/cloud_connector_server_image:0.5.2
Image ID:
Port: 50800/TCP
Host Port: 50800/TCP
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Liveness: http-get https://:50800/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
Environment Variables from:
pl-cloud-config ConfigMap Optional: false
pl-cloud-connector-tls-config ConfigMap Optional: false
pl-cloud-connector-bootstrap-config ConfigMap Optional: true
Environment:
PL_POD_NAME: vizier-cloud-connector-5696d4d66b-2td4h (v1:metadata.name)
PL_JWT_SIGNING_KEY: <set to the key 'jwt-signing-key' in secret 'pl-cluster-secrets'> Optional: false
PL_CLUSTER_ID: <set to the key 'cluster-id' in secret 'pl-cluster-secrets'> Optional: true
PL_SENTRY_DSN:
PL_DEPLOY_KEY: <set to the key 'deploy-key' in secret 'pl-deploy-secrets'> Optional: true
PL_POD_NAMESPACE: pl (v1:metadata.namespace)
PL_MAX_EXPECTED_CLOCK_SKEW: 2000
PL_RENEW_PERIOD: 1000
Mounts:
/certs from certs (rw)
/var/run/secrets/kubernetes.io/serviceaccount from cloud-conn-service-account-token-jlg49 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
certs:
Type: Secret (a volume populated by a Secret)
SecretName: service-tls-certs
Optional: false
cloud-conn-service-account-token-jlg49:
Type: Secret (a volume populated by a Secret)
SecretName: cloud-conn-service-account-token-jlg49
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 20h default-scheduler Successfully assigned pl/vizier-cloud-connector-5696d4d66b-2td4h to pi4-k8s-node4
Warning FailedMount 20h (x12 over 20h) kubelet, pi4-k8s-node4 MountVolume.SetUp failed for volume "certs" : secret "service-tls-certs" not found
Warning FailedMount 20h (x4 over 20h) kubelet, pi4-k8s-node4 Unable to attach or mount volumes: unmounted volumes=[certs], unattached volumes=[certs cloud-conn-service-account-token-jlg49]: timed out waiting for the condition
Warning FailedMount 84s (x12 over 9m38s) kubelet, pi4-k8s-node4 MountVolume.SetUp failed for volume "certs" : secret "service-tls-certs" not found
Warning FailedMount 49s (x4 over 7m35s) kubelet, pi4-k8s-node4 Unable to attach or mount volumes: unmounted volumes=[certs], unattached volumes=[certs cloud-conn-service-account-token-jlg49]: timed out waiting for the condition
App information (please complete the following information):
- Pixie version - 0.3.9+Distribution.8d5651b.20201006163355.1
- K8s cluster version - Microk8s 1.19 on Pi4 cluster
ubuntu@pi4-k8s-master:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
pi4-k8s-master Ready <none> 32h v1.19.0-34+ff9309c628eb68
pi4-k8s-node4 Ready <none> 32h v1.19.0-34+ff9309c628eb68
pi4-k8s-node1 Ready <none> 32h v1.19.0-34+ff9309c628eb68
pi4-k8s-node3 Ready <none> 32h v1.19.0-34+ff9309c628eb68
pi4-k8s-node2 Ready <none> 32h v1.19.0-34+ff9309c628eb68
The root of this issue is most likely the fact that we don't currently support ARM CPUs. Separately, we should confirm that Pixie works with microk8s.
Will just add +1 for arm CPUs here
Definitively needed on AWS. Either that, or px cli and the operator pods should at least have an affinity to be scheduled on amd64 only. Otherwise it will just randomly fail. How can one set this manually for the pixie-operator-index pods? They don't seem to be created by deployment or statefulsets, so we cannot inject that currently.
We too are waiting for this bug fix. As ARM-based instances are cheaper, we are leveraging them as much as possible. I hope that this will be addressed soon.
We're deep into ARM64 territory with our new infrastructure and services, and we cannot run New Relic's latest K8s observability stack due to a lack of ARM64 compatibility with pixie :-(
+1 to ARM64 builds & support...<3
Many cloud providers offer ARM in Kubernetes nodes and as they are cheaper it should become a popular choice. I did like to know that there is work on this, right now I can't use pixie in my cluster for that reason. Thanks.
A quick update for those waiting for ARM support: we do have plans to support ARM, but it’s not yet under development. This work is tentatively planned for Q1'23 and we'll update this thread once we have more to share!
Looking forward to ARM64 support so much!
Hi all, thanks for your patience. With our latest release of Vizier, 0.12.17, we now have early alpha support for ARM64. Our testing infrastructure doesn't yet support ARM, so we'd like to emphasize that this a very early version of ARM support. We're planning on fixing our testing infrastructure in the near future, and will announce more confident support for ARM at that time.
To deploy to an ARM cluster, make sure there's at least 1 x86 node in the cluster (see known issue 1), and then just run px deploy as usual.
A few known issues:
- If you are doing an operator deploy of vizier (which is the default), you must have at least 1 x86 node in your cluster. This issue is tracked at #892.
- Java profiling is not yet supported on ARM (tracked at #891).
- The etcd version of Vizier is not yet supported on ARM (tracked at #893). This means your cluster must have support for persistent volumes to work with Vizier on ARM.
We'd love for the community to help us test ARM support, if you find any issues feel free to open new issues with the specific issue on ARM. Please prefix your issue title with [ARM]
any updates on this? seems arm is still not supported but this issue is marked closed