gpu-operator
gpu-operator copied to clipboard
RFE - Support for GPU Operator on ARM (Specifically Nvidia Jetson AGX Xavier)
I currently have been able to deploy a development release of Red Hat OpenShift 4.9 running on RHCOS in a single node scenario on my Nvidia Jetson AGX Xavier:
$ oc get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master-0.kni7.schmaustech.com Ready master,worker 43h v1.21.0-rc.0+ec0996b 192.168.0.47
baremetal 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
cloud-credential 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
cluster-autoscaler 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
config-operator 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
console 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 122m
csi-snapshot-controller 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 27h
dns 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 125m
etcd 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
image-registry 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
ingress 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
insights 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
kube-apiserver 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
kube-controller-manager 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
kube-scheduler 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
kube-storage-version-migrator 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
machine-api 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
machine-approver 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
machine-config 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
marketplace 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
monitoring 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 122m
network 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
node-tuning 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
openshift-apiserver 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 123m
openshift-controller-manager 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 20h
openshift-samples 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
operator-lifecycle-manager 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
operator-lifecycle-manager-catalog 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
operator-lifecycle-manager-packageserver 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 123m
service-ca 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
storage 4.9.0-0.nightly-arm64-2021-06-29-064214 True False False 43h
I would like to be able to use the GPU-Operator to be able to access the GPU in the AGX Xavier but believe its not possible as of right now as I tried to deploy it and got the following:
$ oc get all -n gpu-operator-resources No resources found in gpu-operator-resources namespace. $ oc get all | egrep 'node|gpu' pod/gpu-operator-64df558567-r6zr8 0/1 CrashLoopBackOff 6 8m54s deployment.apps/gpu-operator 0/1 1 0 8m54s replicaset.apps/gpu-operator-64df558567 1 1 0 8m54s $ oc logs gpu-operator-64df558567-r6zr8 standard_init_linux.go:219: exec user process caused: exec format error
Is this something planned in the future?
@schmaustech I will get back to you on this.
@shivamerla Any movement or update on this?
@schmaustech Support for GPU Operator on ARM is currently targeted for Q1 2022.
@shivamerla - Any update on this, please?
@shivamerla How's this going?
@jasonbarbee @David-VTUK While GPU operator v1.10.x added support for ARM platform, support for Jetson devices is not yet there. It needs changes in k8s-device-plugin and container-toolkit which is in the roadmap.
Any update here 2024?