pixie icon indicating copy to clipboard operation
pixie copied to clipboard

Support ARM CPUs

Open JamesAtIntegratnIO opened this issue 5 years ago • 7 comments

Describe the bug A clear and concise description of what the bug is.

To Reproduce Steps to reproduce the behavior:

  1. Follow this tutorial to deploy Microk8s on a pi cluster. I used the channel for Microk8s 1.19
  2. enable the following features in microk8s,
addons:
  enabled:
    dashboard            # The Kubernetes dashboard
    dns                  # CoreDNS
    ha-cluster           # Configure high availability on the current node
    host-access          # Allow Pods connecting to Host services smoothly
    ingress              # Ingress controller for external access
    metrics-server       # K8s Metrics Server for API access to service metrics
    rbac                 # Role-Based Access Control for authorisation
    registry             # Private image registry exposed on localhost:32000
    storage              # Storage class; allocates storage from host directory
  1. Create an account with cluster admin privilege using certificates.
  2. Create a kubeconfig on an external instance of kubectl using that cluster admin account
  3. Install Pixie alongside kubectl
  4. Deploy the Pixie Demo
  5. Deploy Pixie
  6. Pods fail in various states. Log output for each pod below.

Expected behavior successful deployment

Logs

boboysdadda@DESKTOP-US92ARK:~$ px deploy
Pixie CLI

Running Cluster Checks:
 ✔    Kernel version > 4.14.0
 ✔    Cluster type is supported
 ✔    K8s version > 1.12.0
 ✔    Kubectl > 1.10.0 is present
 ✔    User can create namespace
Installing version: 0.5.2
Generating YAMLs for Pixie
Deploying Pixie to the following cluster: microk8s

Is the cluster correct? (y/n) [y] : y
Found 5 nodes
 ✔    Creating namespace
 ✔    Deleting stale Pixie objects, if any
 ✔    Deploying secrets and configmaps
 ✔    Deploying Cloud Connector
 ⠼    Waiting for Cloud Connector to come online
[0142] FATAL Timed out waiting for cluster ID assignment
boboysdadda@DESKTOP-US92ARK:~$ kubectl get pods -n pl
NAME                                      READY   STATUS              RESTARTS   AGE
vizier-cloud-connector-5696d4d66b-2td4h   0/1     ContainerCreating   0          20h
cert-provisioner-job-kjw6n                0/1     Error               0          20h
cert-provisioner-job-trgcv                0/1     Error               0          20h
boboysdadda@DESKTOP-US92ARK:~$ kubectl describe pod cert-provisioner-job-kjw6n -n pl
Name:         cert-provisioner-job-kjw6n
Namespace:    pl
Priority:     0
Node:         pi4-k8s-node4/192.168.2.14
Start Time:   Tue, 06 Oct 2020 21:53:08 -0600
Labels:       app=pl-monitoring
              component=vizier
              controller-uid=32a94f06-a0fb-4db7-889d-98ea8636cfa4
              job-name=cert-provisioner-job
              vizier-bootstrap=true
Annotations:  cni.projectcalico.org/podIP: 10.1.217.16/32
              cni.projectcalico.org/podIPs: 10.1.217.16/32
Status:       Failed
IP:           10.1.217.16
IPs:
  IP:           10.1.217.16
Controlled By:  Job/cert-provisioner-job
Containers:
  provisioner:
    Container ID:   containerd://19bda16f93f349759f9d54878904d2d1eb0003d68b244bc8f393a60270fa7545
    Image:          gcr.io/pixie-prod/vizier/cert_provisioner_image:0.5.2
    Image ID:       gcr.io/pixie-prod/vizier/cert_provisioner_image@sha256:e76e704b00259fe4f8bee6b8761b3676dc57a3119f153a1f980ca390f2387a9b
    Port:           <none>
    Host Port:      <none>
    State:          Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 06 Oct 2020 21:53:11 -0600
      Finished:     Tue, 06 Oct 2020 21:53:11 -0600
    Ready:          False
    Restart Count:  0
    Environment Variables from:
      pl-cloud-config  ConfigMap  Optional: false
    Environment:       <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from updater-service-account-token-4d7gj (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  updater-service-account-token-4d7gj:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  updater-service-account-token-4d7gj
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type    Reason     Age   From                    Message
  ----    ------     ----  ----                    -------
  Normal  Scheduled  20h   default-scheduler       Successfully assigned pl/cert-provisioner-job-kjw6n to pi4-k8s-node4
  Normal  Pulled     20h   kubelet, pi4-k8s-node4  Container image "gcr.io/pixie-prod/vizier/cert_provisioner_image:0.5.2" already present on machine
  Normal  Created    20h   kubelet, pi4-k8s-node4  Created container provisioner
  Normal  Started    20h   kubelet, pi4-k8s-node4  Started container provisioner
Name:         cert-provisioner-job-trgcv
Namespace:    pl
Priority:     0
Node:         pi4-k8s-node4/192.168.2.14
Start Time:   Tue, 06 Oct 2020 21:53:12 -0600
Labels:       app=pl-monitoring
              component=vizier
              controller-uid=32a94f06-a0fb-4db7-889d-98ea8636cfa4
              job-name=cert-provisioner-job
              vizier-bootstrap=true
Annotations:  cni.projectcalico.org/podIP: 10.1.217.17/32
              cni.projectcalico.org/podIPs: 10.1.217.17/32
Status:       Failed
IP:           10.1.217.17
IPs:
  IP:           10.1.217.17
Controlled By:  Job/cert-provisioner-job
Containers:
  provisioner:
    Container ID:   containerd://07ec2b65d5f38a892341e639b082fb6968ccee51d1c64c3085e228ca034c1f71
    Image:          gcr.io/pixie-prod/vizier/cert_provisioner_image:0.5.2
    Image ID:       gcr.io/pixie-prod/vizier/cert_provisioner_image@sha256:e76e704b00259fe4f8bee6b8761b3676dc57a3119f153a1f980ca390f2387a9b
    Port:           <none>
    Host Port:      <none>
    State:          Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 06 Oct 2020 21:53:14 -0600
      Finished:     Tue, 06 Oct 2020 21:53:14 -0600
    Ready:          False
    Restart Count:  0
    Environment Variables from:
      pl-cloud-config  ConfigMap  Optional: false
    Environment:       <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from updater-service-account-token-4d7gj (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  updater-service-account-token-4d7gj:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  updater-service-account-token-4d7gj
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type    Reason     Age   From                    Message
  ----    ------     ----  ----                    -------
  Normal  Scheduled  20h   default-scheduler       Successfully assigned pl/cert-provisioner-job-trgcv to pi4-k8s-node4
  Normal  Pulled     20h   kubelet, pi4-k8s-node4  Container image "gcr.io/pixie-prod/vizier/cert_provisioner_image:0.5.2" already present on machine
  Normal  Created    20h   kubelet, pi4-k8s-node4  Created container provisioner
  Normal  Started    20h   kubelet, pi4-k8s-node4  Started container provisioner
boboysdadda@DESKTOP-US92ARK:~$ kubectl describe pod -n pl vizier-cloud-connector-5696d4d66b-2td4h
Name:           vizier-cloud-connector-5696d4d66b-2td4h
Namespace:      pl
Priority:       0
Node:           pi4-k8s-node4/192.168.2.14
Start Time:     Tue, 06 Oct 2020 21:53:08 -0600
Labels:         app=pl-monitoring
                component=vizier
                name=vizier-cloud-connector
                plane=control
                pod-template-hash=5696d4d66b
                vizier-bootstrap=true
Annotations:    fluentbit.io/parser: logfmt
Status:         Pending
IP:
IPs:            <none>
Controlled By:  ReplicaSet/vizier-cloud-connector-5696d4d66b
Containers:
  app:
    Container ID:
    Image:          gcr.io/pixie-prod/vizier/cloud_connector_server_image:0.5.2
    Image ID:
    Port:           50800/TCP
    Host Port:      50800/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Liveness:       http-get https://:50800/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment Variables from:
      pl-cloud-config                      ConfigMap  Optional: false
      pl-cloud-connector-tls-config        ConfigMap  Optional: false
      pl-cloud-connector-bootstrap-config  ConfigMap  Optional: true
    Environment:
      PL_POD_NAME:                 vizier-cloud-connector-5696d4d66b-2td4h (v1:metadata.name)
      PL_JWT_SIGNING_KEY:          <set to the key 'jwt-signing-key' in secret 'pl-cluster-secrets'>  Optional: false
      PL_CLUSTER_ID:               <set to the key 'cluster-id' in secret 'pl-cluster-secrets'>       Optional: true
      PL_SENTRY_DSN:
      PL_DEPLOY_KEY:               <set to the key 'deploy-key' in secret 'pl-deploy-secrets'>  Optional: true
      PL_POD_NAMESPACE:            pl (v1:metadata.namespace)
      PL_MAX_EXPECTED_CLOCK_SKEW:  2000
      PL_RENEW_PERIOD:             1000
    Mounts:
      /certs from certs (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from cloud-conn-service-account-token-jlg49 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  certs:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  service-tls-certs
    Optional:    false
  cloud-conn-service-account-token-jlg49:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  cloud-conn-service-account-token-jlg49
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason       Age                   From                    Message
  ----     ------       ----                  ----                    -------
  Normal   Scheduled    20h                   default-scheduler       Successfully assigned pl/vizier-cloud-connector-5696d4d66b-2td4h to pi4-k8s-node4
  Warning  FailedMount  20h (x12 over 20h)    kubelet, pi4-k8s-node4  MountVolume.SetUp failed for volume "certs" : secret "service-tls-certs" not found
  Warning  FailedMount  20h (x4 over 20h)     kubelet, pi4-k8s-node4  Unable to attach or mount volumes: unmounted volumes=[certs], unattached volumes=[certs cloud-conn-service-account-token-jlg49]: timed out waiting for the condition
  Warning  FailedMount  84s (x12 over 9m38s)  kubelet, pi4-k8s-node4  MountVolume.SetUp failed for volume "certs" : secret "service-tls-certs" not found
  Warning  FailedMount  49s (x4 over 7m35s)   kubelet, pi4-k8s-node4  Unable to attach or mount volumes: unmounted volumes=[certs], unattached volumes=[certs cloud-conn-service-account-token-jlg49]: timed out waiting for the condition

App information (please complete the following information):

  • Pixie version - 0.3.9+Distribution.8d5651b.20201006163355.1
  • K8s cluster version - Microk8s 1.19 on Pi4 cluster
ubuntu@pi4-k8s-master:~$ kubectl get nodes
NAME             STATUS   ROLES    AGE   VERSION
pi4-k8s-master   Ready    <none>   32h   v1.19.0-34+ff9309c628eb68
pi4-k8s-node4    Ready    <none>   32h   v1.19.0-34+ff9309c628eb68
pi4-k8s-node1    Ready    <none>   32h   v1.19.0-34+ff9309c628eb68
pi4-k8s-node3    Ready    <none>   32h   v1.19.0-34+ff9309c628eb68
pi4-k8s-node2    Ready    <none>   32h   v1.19.0-34+ff9309c628eb68

JamesAtIntegratnIO avatar Oct 08 '20 00:10 JamesAtIntegratnIO

The root of this issue is most likely the fact that we don't currently support ARM CPUs. Separately, we should confirm that Pixie works with microk8s.

htroisi avatar Oct 13 '20 19:10 htroisi

Will just add +1 for arm CPUs here

liflovs avatar Sep 15 '21 06:09 liflovs

Definitively needed on AWS. Either that, or px cli and the operator pods should at least have an affinity to be scheduled on amd64 only. Otherwise it will just randomly fail. How can one set this manually for the pixie-operator-index pods? They don't seem to be created by deployment or statefulsets, so we cannot inject that currently.

autarchprinceps avatar Oct 18 '21 14:10 autarchprinceps

We too are waiting for this bug fix. As ARM-based instances are cheaper, we are leveraging them as much as possible. I hope that this will be addressed soon.

gbhosal avatar Mar 25 '22 07:03 gbhosal

We're deep into ARM64 territory with our new infrastructure and services, and we cannot run New Relic's latest K8s observability stack due to a lack of ARM64 compatibility with pixie :-(

+1 to ARM64 builds & support...<3

armenr avatar May 30 '22 00:05 armenr

Many cloud providers offer ARM in Kubernetes nodes and as they are cheaper it should become a popular choice. I did like to know that there is work on this, right now I can't use pixie in my cluster for that reason. Thanks.

giovannicandido avatar Sep 14 '22 15:09 giovannicandido

A quick update for those waiting for ARM support: we do have plans to support ARM, but it’s not yet under development. This work is tentatively planned for Q1'23 and we'll update this thread once we have more to share!

htroisi avatar Sep 14 '22 22:09 htroisi

Looking forward to ARM64 support so much!

r12f avatar Dec 10 '22 07:12 r12f

Hi all, thanks for your patience. With our latest release of Vizier, 0.12.17, we now have early alpha support for ARM64. Our testing infrastructure doesn't yet support ARM, so we'd like to emphasize that this a very early version of ARM support. We're planning on fixing our testing infrastructure in the near future, and will announce more confident support for ARM at that time.

To deploy to an ARM cluster, make sure there's at least 1 x86 node in the cluster (see known issue 1), and then just run px deploy as usual.

A few known issues:

  1. If you are doing an operator deploy of vizier (which is the default), you must have at least 1 x86 node in your cluster. This issue is tracked at #892.
  2. Java profiling is not yet supported on ARM (tracked at #891).
  3. The etcd version of Vizier is not yet supported on ARM (tracked at #893). This means your cluster must have support for persistent volumes to work with Vizier on ARM.

We'd love for the community to help us test ARM support, if you find any issues feel free to open new issues with the specific issue on ARM. Please prefix your issue title with [ARM]

JamesMBartlett avatar Mar 01 '23 23:03 JamesMBartlett

any updates on this? seems arm is still not supported but this issue is marked closed

pdeva avatar Feb 13 '24 23:02 pdeva