GPUMounter Will support Cgroup v2 in the future ?

Will support Cgroup v2 in the future ?

Open lyon-v opened this issue 1 month ago • 1 comments

I test this successfully in Cgroup v1, but Cgroup v2 was wrong.

Here are the logs :

2025-01-13T03:41:56.696Z 2025-01-13T03:41:56.696Z 2025-01-13T03:41:56.696Z 2025-01-13T03:41:56.696Z 2025-01-13T03:41:56.696Z 2025-01-13T03:41:56.704Z 2025-01-13T03:41:56.710Z 2025-01-13T03:41:56.711Z 2025-01-13T03:41:56.711Z 2025-01-13T03:41:56.711Z 2025-01-13T03:41:56.711Z 2025-01-13T03:41:56.711Z 2025-01-13T03:41:58.732Z 2025-01-13T03:41:58.732Z 2025-01-13T03:41:58.750Z 2025-01-13T03:41:58.750Z 2025-01-13T03:41:58.750Z 2025-01-13T03:41:58.750Z 2025-01-13T03:41:58.758Z 2025-01-13T03:41:58.758Z 2025-01-13T03:41:58.760Z 2025-01-13T03:41:58.762Z 2025-01-13T03:41:58.763Z 2025-01-13T03:41:58.765Z 2025-01-13T03:42:00.142Z 2025-01-13T03:42:00.142Z 2025-01-13T03:42:00.142Z 2025-01-13T03:42:00.143Z 2025-01-13T03:42:00.143Z 2025-01-13T03:42:00.143Z 2025-01-13T03:42:00.143Z 2025-01-13T03:42:00.143Z 2025-01-13T03:42:00.143Z 2025-01-13T03:42:00.145Z 2025-01-13T03:42:00.145Z INFO GPUMounter-worker/main.go:15 Service Starting... INFO gpu-mount/server.go:22 Creating gpu mounter INFO allocator/allocator.go:28 Creating gpu allocator INFO collector/collector.go:24 Creating gpu collector INFO collector/collector.go:42 Start get gpu info INFO collector/collector.go:53 GPU Num: 1 INFO collector/collector.go:91 Updating GPU status INFO collector/collector.go:136 GPU status update successfully INFO collector/collector.go:36 Successfully update gpu status INFO allocator/allocator.go:35 Successfully created gpu collector INFO gpu-mount/server.go:29 Successfully created gpu allocator INFO GPUMounter-worker/main.go:22 Successfully created gpu mounter INFO gpu-mount/server.go:35 AddGPU Service Called INFO gpu-mount/server.go:36 request: pod_name:"owner-pod" namespace:"default" gpu_num:1 INFO gpu-mount/server.go:55 Successfully get Pod: default in cluster INFO allocator/allocator.go:159 Get pod default/owner-pod mount type INFO collector/collector.go:91 Updating GPU status INFO collector/collector.go:136 GPU status update successfully INFO allocator/allocator.go:59 Creating GPU Slave Pod: owner-pod-slave-pod-40a529 for Owner Pod: owner-pod INFO allocator/allocator.go:239 Checking Pods: owner-pod-slave-pod-40a529 state INFO allocator/allocator.go:265 Pod: owner-pod-slave-pod-40a529 creating INFO allocator/allocator.go:265 Pod: owner-pod-slave-pod-40a529 creating INFO allocator/allocator.go:265 Pod: owner-pod-slave-pod-40a529 creating INFO allocator/allocator.go:265 Pod: owner-pod-slave-pod-40a529 creating INFO allocator/allocator.go:278 Pods: owner-pod-slave-pod-40a529 are running INFO allocator/allocator.go:84 Successfully create Slave Pod: owner-pod-slave-pod-40a529, for Owner Pod: owner-pod INFO collector/collector.go:91 Updating GPU status DEBUG collector/collector.go:130 GPU: /dev/nvidia0 allocated to Pod: owner-pod-slave-pod-40a529 in Namespace default INFO collector/collector.go:136 GPU status update successfully INFO gpu-mount/server.go:81 Start mounting, Total: 1 Current: 1 INFO util/util.go:19 Start mount GPU: {"MinorNumber":0,"DeviceFilePath":"/dev/nvidia0","UUID":"GPU-cf2f070a-ff5a-ee2b-16ac-047f7e9c16bb","State":"GPU_ALLOCATED_STATE","PodName":"owner-pod-slave-pod-40a529","Namespace":"default"} to Pod: owner-pod INFO util/util.go:24 Pod :owner-pod container ID: a893d886e17a63b2b056cef9766df9ae2b0a1f130a2c6952ed029a0fe5b1b740 INFO util/util.go:35 Successfully get cgroup path: /kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod4fea5b3d_b5ff_4e7f_a0f6_31a9ba061196.slice/docker-a893d886e17a63b2b056cef9766df9ae2b0a1f130a2c6952ed029a0fe5b1b740.scope for Pod: owner-pod ERROR cgroup/cgroup.go:148 Exec "echo 'c 195:0 rw' > /sys/fs/cgroup/devices/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod4fea5b3d_b5ff_4e7f_a0f6_31a9ba061196.slice/docker-a893d886e17a63b2b056cef9766df9ae2b0a1f130a2c6952ed029a0fe5b1b740.scope/devices.allow" failed ERROR cgroup/cgroup.go:149 Output: sh: 1: cannot create /sys/fs/cgroup/devices/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod4fea5b3d_b5ff_4e7f_a0f6_31a9ba061196.slice/docker-a893d886e17a63b2b056cef9766df9ae2b0a1f130a2c6952ed029a0fe5b1b740.scope/devices.allow: Directory nonexistent

2025-01-13T03:42:00.145Z ERROR cgroup/cgroup.go:150 exit status 2 2025-01-13T03:42:00.145Z ERROR util/util.go:38 Add GPU {"MinorNumber":0,"DeviceFilePath":"/dev/nvidia0","UUID":"GPU-cf2f070a-ff5a-ee2b-16ac-047f7e9c16bb","State":"GPU_ALLOCATED_STATE","PodName":"owner-pod-slave-pod-40a529","Namespace":"default"}failed 2025-01-13T03:42:00.145Z ERROR gpu-mount/server.go:84 Mount GPU: {"MinorNumber":0,"DeviceFilePath":"/dev/nvidia0","UUID":"GPU-cf2f070a-ff5a-ee2b-16ac-047f7e9c16bb","State":"GPU_ALLOCATED_STATE","PodName":"owner-pod-slave-pod-40a529","Namespace":"default"} to Pod: owner-pod in Namespace: default failed 2025-01-13T03:42:00.145Z ERROR gpu-mount/server.go:85 exit status 2

I checked this filedir, there were no /sys/fs/cgroup/devices .

Jan 13 '25 05:01 lyon-v

GPUMounter GPUMounter copied to clipboard

Will support Cgroup v2 in the future ?

Here are the logs :

GPUMounter
GPUMounter copied to clipboard