gpushare-device-plugin icon indicating copy to clipboard operation
gpushare-device-plugin copied to clipboard

question about pods with multiple containers

Open monstercy opened this issue 5 years ago • 3 comments

Hi @cheyang,

I am currently studing the code and have a question about pods that have multiple containers. I trace the code in kubelet and find that it called the allocate function for each container.

for _, container := range pod.Spec.Containers {
		if err := m.allocateContainerResources(pod, &container, devicesToReuse); err != nil {
			return err
		}
		m.podDevices.removeContainerAllocatedResources(string(pod.UID), container.Name, devicesToReuse)
	}
		devs := allocDevices.UnsortedList()
		// TODO: refactor this part of code to just append a ContainerAllocationRequest
		// in a passed in AllocateRequest pointer, and issues a single Allocate call per pod.
		klog.V(3).Infof("Making allocation request for devices %v for device plugin %s", devs, resource)
		resp, err := eI.e.allocate(devs)
		metrics.DevicePluginAllocationDuration.WithLabelValues(resource).Observe(metrics.SinceInSeconds(startRPCTime))
		metrics.DeprecatedDevicePluginAllocationLatency.WithLabelValues(resource).Observe(metrics.SinceInMicroseconds(startRPCTime))

this case may corrupt the finding pod logic in device plugin allocate function . Have you meet this issue?

monstercy avatar May 08 '19 07:05 monstercy

Thanks @cheyang 👍 for all the work you have done, really an excellent job.

I vote for this issue: +1

For example, one pod have two containers, each needs 3Gi memory, kubelet will call allocate function for each container:

   	for _, container := range pod.Spec.Containers {
		if err := m.allocateContainerResources(pod, &container, devicesToReuse); err != nil {
			return err
		}
...
        resp, err := eI.e.allocate(devs)
...
	return e.client.Allocate(context.Background(), &pluginapi.AllocateRequest{
		ContainerRequests: []*pluginapi.ContainerAllocateRequest{
			{DevicesIDs: devs},
		},
	})

So such logic in device plugin makes no sense, cause reqs only contains one container

	// podReqGPU = uint(0)
	for _, req := range reqs.ContainerRequests {
		podReqGPU += uint(len(req.DevicesIDs))
	}

Thus the following equation may never be satisfied for this case

	if getGPUMemoryFromPodResource(pod) == podReqGPU {

@cheyang WDYT

YuxiJin-tobeyjin avatar May 09 '19 03:05 YuxiJin-tobeyjin

Thanks, @YuxiJin-tobeyjin @monstercy , I think you are right. It should be handled. I will take a look at this later. If you have solutions, your contributions are welcome.

cheyang avatar Jun 15 '19 10:06 cheyang

Anything new here? It seems that the issue is still present.

xhejtman avatar Jul 22 '22 15:07 xhejtman