GPUMounter-worker error in k8s v1.23.1
GPUMounter-master.log: 2022-01-16T11:24:14.610Z INFO GPUMounter-master/main.go:25 access add gpu service 2022-01-16T11:24:14.610Z INFO GPUMounter-master/main.go:30 Pod: test Namespace: default GPU Num: 1 Is entire mount: false 2022-01-16T11:24:14.627Z INFO GPUMounter-master/main.go:66 Found Pod: test in Namespace: default on Node: rtxws 2022-01-16T11:24:14.634Z INFO GPUMounter-master/main.go:265 Worker: gpu-mounter-workers-7dsdf Node: rtxws 2022-01-16T11:24:19.648Z ERROR GPUMounter-master/main.go:98 Failed to call add gpu service 2022-01-16T11:24:19.648Z ERROR GPUMounter-master/main.go:99 rpc error: code = Unknown desc = Service Internal Error
GPUMounter-worker.log: 2022-01-16T11:24:14.635Z INFO gpu-mount/server.go:35 AddGPU Service Called 2022-01-16T11:24:14.635Z INFO gpu-mount/server.go:36 request: pod_name:"test" namespace:"default" gpu_num:1 2022-01-16T11:24:14.645Z INFO gpu-mount/server.go:55 Successfully get Pod: default in cluster 2022-01-16T11:24:14.645Z INFO allocator/allocator.go:159 Get pod default/test mount type 2022-01-16T11:24:14.645Z INFO collector/collector.go:91 Updating GPU status 2022-01-16T11:24:14.646Z INFO collector/collector.go:136 GPU status update successfully 2022-01-16T11:24:14.657Z INFO allocator/allocator.go:59 Creating GPU Slave Pod: test-slave-pod-2f66ed for Owner Pod: test 2022-01-16T11:24:14.657Z INFO allocator/allocator.go:238 Checking Pods: test-slave-pod-2f66ed state 2022-01-16T11:24:14.661Z INFO allocator/allocator.go:264 Pod: test-slave-pod-2f66ed creating 2022-01-16T11:24:19.442Z INFO allocator/allocator.go:277 Pods: test-slave-pod-2f66ed are running 2022-01-16T11:24:19.442Z INFO allocator/allocator.go:84 Successfully create Slave Pod: %s, for Owner Pod: %s test-slave-pod-2f66edtest 2022-01-16T11:24:19.442Z INFO collector/collector.go:91 Updating GPU status 2022-01-16T11:24:19.444Z DEBUG collector/collector.go:130 GPU: /dev/nvidia0 allocated to Pod: test-slave-pod-2f66ed in Namespace gpu-pool 2022-01-16T11:24:19.444Z INFO collector/collector.go:136 GPU status update successfully 2022-01-16T11:24:19.444Z INFO gpu-mount/server.go:81 Start mounting, Total: 1 Current: 1 2022-01-16T11:24:19.444Z INFO util/util.go:19 Start mount GPU: {"MinorNumber":0,"DeviceFilePath":"/dev/nvidia0","UUID":"GPU-7fe47fc1-b21e-e675-f6ff-edd91910f8a7","State":"GPU_ALLOCATED_STATE","PodName":"test-slave-pod-2f66ed","Namespace":"gpu-pool"} to Pod: test 2022-01-16T11:24:19.444Z INFO util/util.go:24 Pod :test container ID: e317ca7f5eb5e3c523fab9f0744a065cd69013a7c09522318d4bbf98ad0bb1c3 2022-01-16T11:24:19.444Z INFO util/util.go:30 Successfully get cgroup path: /kubepods/burstable/podc815ee4b-bea0-44ed-8ef4-239e69516ba2/e317ca7f5eb5e3c523fab9f0744a065cd69013a7c09522318d4bbf98ad0bb1c3 for Pod: test 2022-01-16T11:24:19.445Z ERROR cgroup/cgroup.go:140 Exec "echo 'c 195:0 rw' > /sys/fs/cgroup/devices/kubepods/burstable/podc815ee4b-bea0-44ed-8ef4-239e69516ba2/e317ca7f5eb5e3c523fab9f0744a065cd69013a7c09522318d4bbf98ad0bb1c3/devices.allow" failed 2022-01-16T11:24:19.445Z ERROR cgroup/cgroup.go:141 Output: sh: 1: cannot create /sys/fs/cgroup/devices/kubepods/burstable/podc815ee4b-bea0-44ed-8ef4-239e69516ba2/e317ca7f5eb5e3c523fab9f0744a065cd69013a7c09522318d4bbf98ad0bb1c3/devices.allow: Directory nonexistent
2022-01-16T11:24:19.445Z ERROR cgroup/cgroup.go:142 exit status 2 2022-01-16T11:24:19.445Z ERROR util/util.go:33 Add GPU {"MinorNumber":0,"DeviceFilePath":"/dev/nvidia0","UUID":"GPU-7fe47fc1-b21e-e675-f6ff-edd91910f8a7","State":"GPU_ALLOCATED_STATE","PodName":"test-slave-pod-2f66ed","Namespace":"gpu-pool"}failed 2022-01-16T11:24:19.445Z ERROR gpu-mount/server.go:84 Mount GPU: {"MinorNumber":0,"DeviceFilePath":"/dev/nvidia0","UUID":"GPU-7fe47fc1-b21e-e675-f6ff-edd91910f8a7","State":"GPU_ALLOCATED_STATE","PodName":"test-slave-pod-2f66ed","Namespace":"gpu-pool"} to Pod: test in Namespace: default failed 2022-01-16T11:24:19.445Z ERROR gpu-mount/server.go:85 exit status 2
環境與版本
- k8s version: v1.23
- docker-client version:19.03.13
- docekr-server version:20.10.12
在k8s v1.23裡, "/sys/fs/cgroup/devices/kubepods/burstable/pod[pod-id]/[container-id]/devices.allow" 改為 "/sys/fs/cgroup/devices/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod[pod-id]/docker-[container-id].scope/devices.allow"
所以當前GPUMounter在v1.23裡無法正常運作
是否可以更新至可符合k8s v1.23版,謝謝
Thanks for your feedback. I will try to fix it. PRs are also very welcomed!
ok, this bug is sloved.
use environment and version:
- OS : ubuntu 20.04.1
- k8s version : v1.23.1
- docker-client version : 19.03.13
- docker-server version : 20.10.12
- CRI: docker
- cgroup driver : systemd
i use nvidia k8s-device-plugin, and i setting "/etc/docker/daemon.json" contant:
{
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2",
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
in code of "/pkg/util/util.go", always pass "cgroupfs" to cgroupDriver when call function GetCgroupName, then will error.
so this bug not k8s version problem!
and pod id have _ in k8s v1.23.1, so don't check _ characters in function NewCgroupName.
so need detect for what use cgroup method of now. but i'm rookie for golang, so need more time to coding, i will send PR in few day later.
and need edit title? if need will can direct edit.
@cool9203 Happy Spring Festival!
Thanks for your efforts. Sorry for waiting so long time.
- The checking of
_is to handle the systemd cgroup driver. But if_can be involved in pod id, it may be complex to handle. Can you show me some k8s document descriptions about_in pod id? https://github.com/pokerfaceSad/GPUMounter/blob/7036133177eabe2e32e03b33392df17dd8945dd1/pkg/util/cgroup/cgroup.go#L33-L40 - Pass constant
cgroupfsis really a bug! It should be configurable. https://github.com/pokerfaceSad/GPUMounter/blob/7036133177eabe2e32e03b33392df17dd8945dd1/pkg/util/util.go#L25
@pokerfaceSad Happy Spring Festival!! thanks for your reply.
- https://github.com/pokerfaceSad/GPUMounter/blob/7036133177eabe2e32e03b33392df17dd8945dd1/pkg/util/cgroup/cgroup.go#L33-L40
you right, today i test done, this is can run.
this is not necessary edit.
this edit is my test in the beginning.
my bug is pass
cgroupfsin https://github.com/pokerfaceSad/GPUMounter/blob/7036133177eabe2e32e03b33392df17dd8945dd1/pkg/util/util.go#L25
i got another problem.
- https://github.com/pokerfaceSad/GPUMounter/blob/7036133177eabe2e32e03b33392df17dd8945dd1/pkg/server/gpu-mount/server.go#L124-L135
in call RemoveGPU, some times get error
Invalid UUIDs. i track this error, found this is slave pod status is terminating, than pod will delete. example:
https://github.com/pokerfaceSad/GPUMounter/blob/7036133177eabe2e32e03b33392df17dd8945dd1/pkg/util/gpu/collector/collector.go#L90 so updategpu will not found any slave pod. then will not get any GPUresource in mounted gpu pod. this error maybe only in k8s v1.23.1? or other version occur too?kubectl get pod --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE gpu-pool test460c04d4-slave-pod-bca118 1/1 Terminating 0 30s
https://github.com/pokerfaceSad/GPUMounter/issues/19#issuecomment-1033637663 maybe i solved this.
https://kubernetes.io/docs/concepts/overview/working-with-objects/owners-dependents/
seem like from k8s v1.20+, owner pod and slave pod be need in same namespace.
if owner pod and slave pod not in same namespace, slave pod status will is Terminating.
so slave pod namespace need set to same as owner pod namespace.
but i need testing more, i will report testing result.
update
my test result:
kubectl get pod -n gpu-pool
NAME READY STATUS RESTARTS AGE
test 1/1 Running 0 3m12s
test-slave-pod-d34ea2 1/1 Running 0 19s
pod/test.yaml
apiVersion: v1
kind: Pod
metadata:
name: test
namespace: gpu-pool
labels:
app: test
spec:
containers:
- name: test
image: [docker-image]
resources:
requests:
memory: "1024M"
cpu: "1"
env:
- name: NVIDIA_VISIBLE_DEVICES
value: "none"
kubectl describe pod test-slave-pod-d34ea2 -n gpu-pool
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 4s default-scheduler Successfully assigned gpu-pool/test-slave-pod-290964 to rtxws
Normal Pulling 3s kubelet Pulling image "alpine:latest"
Normal Pulled 1s kubelet Successfully pulled image "alpine:latest" in 2.563965249s
Normal Created 1s kubelet Created container gpu-container
Normal Started 1s kubelet Started container gpu-container
owner pod and slave pod not in same namespace pod event:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 4s default-scheduler Successfully assigned gpu-pool/test460c04d4-slave-pod-22d29a to rtxws
Warning OwnerRefInvalidNamespace 5s garbage-collector-controller ownerRef [v1/Pod, namespace: gpu-pool, name: test460c04d4, uid: a55bc88b-60d1-460f-a7c7-4072fe6a9a2c] does not exist in namespace "gpu-pool"
Normal Pulling 4s kubelet Pulling image "alpine:latest"
Normal Pulled 1s kubelet Successfully pulled image "alpine:latest" in 2.568386225s
Normal Created 1s kubelet Created container gpu-container
Normal Started 1s kubelet Started container gpu-container
Normal Killing 0s kubelet Stopping container gpu-container
can see in this test, pod/test.yaml namespace change to gpu-pool.
now, slave pod status is running, not is terminating.
and check pod event, will get pod is running, not is stopping.
i have test for idle 15 minutes, slave pod will be running, not delete.
and can see to not in same namespace event log, show to does not exist in namespace gpu-pool.
so in k8s v1.20+, slave pod and owner pod must of same namespace.
if not same, slave pod status will be terminating.
and call RemoveGPU service, will show Invalid UUIDs error.
maybe don't use gpu-pool namespace in k8s v1.20+.
slave pod always use owner pod namespace, not use gpu-pool.
this is good idea or not? please give me advice, thanks!!
@cool9203 Thank you for revealing this! The reason why slave pod can't be created in owner pod namspace is #3. Maybe need some modifications to adpat k8s v1.20+.
@cool9203
The bug of constant cgroup driver has been fixed in https://github.com/pokerfaceSad/GPUMounter/commit/163ef7b10e7b53180033d1585c9e637c72b3b105.
cgroup driver can be set in /deploy/gpu-mounter-workers.yaml by environment variable CGROUP_DRIVER.
@pokerfaceSad sorry, i reply late.
@cool9203 The bug of constant
cgroup driverhas been fixed in 163ef7b.cgroup drivercan be set in /deploy/gpu-mounter-workers.yaml by environment variableCGROUP_DRIVER.
thanks your fixed, pass a environment variable in worker.yaml is good idea!
@cool9203 Thank you for revealing this! The reason why slave pod can't be created in owner pod namspace is #3. Maybe need some modifications to adpat k8s v1.20+.
i show one solve method in https://github.com/pokerfaceSad/GPUMounter/issues/19#issuecomment-1034134013
in this solve, owner pod and slave pod must be same namespace, like gpu-pool, default, kube-system or other namespace.
and i not set any resource quota.
so like this solve showed, i think owner and slave pod must be same namespace in k8s v1.20+.
what do you think?
@cool9203 In fact, slave pods were created in owner pod namespace before https://github.com/pokerfaceSad/GPUMounter/commit/a378e39793c241d40a80387eab11aa996c95cc93.
However, in a multi-tenant cluster scenario, cluster administrator may use resourse quota feature to limit the resource usage of users.
If GPUMounter create the slave pods in owner pod namespaces, slave pods will consume the resource quota of the user.