gpushare-device-plugin
gpushare-device-plugin copied to clipboard
question about pods with multiple containers
Hi @cheyang,
I am currently studing the code and have a question about pods that have multiple containers. I trace the code in kubelet and find that it called the allocate function for each container.
for _, container := range pod.Spec.Containers {
if err := m.allocateContainerResources(pod, &container, devicesToReuse); err != nil {
return err
}
m.podDevices.removeContainerAllocatedResources(string(pod.UID), container.Name, devicesToReuse)
}
devs := allocDevices.UnsortedList()
// TODO: refactor this part of code to just append a ContainerAllocationRequest
// in a passed in AllocateRequest pointer, and issues a single Allocate call per pod.
klog.V(3).Infof("Making allocation request for devices %v for device plugin %s", devs, resource)
resp, err := eI.e.allocate(devs)
metrics.DevicePluginAllocationDuration.WithLabelValues(resource).Observe(metrics.SinceInSeconds(startRPCTime))
metrics.DeprecatedDevicePluginAllocationLatency.WithLabelValues(resource).Observe(metrics.SinceInMicroseconds(startRPCTime))
this case may corrupt the finding pod logic in device plugin allocate function . Have you meet this issue?
Thanks @cheyang 👍 for all the work you have done, really an excellent job.
I vote for this issue: +1
For example, one pod have two containers, each needs 3Gi memory, kubelet will call allocate
function for each container:
for _, container := range pod.Spec.Containers {
if err := m.allocateContainerResources(pod, &container, devicesToReuse); err != nil {
return err
}
...
resp, err := eI.e.allocate(devs)
...
return e.client.Allocate(context.Background(), &pluginapi.AllocateRequest{
ContainerRequests: []*pluginapi.ContainerAllocateRequest{
{DevicesIDs: devs},
},
})
So such logic in device plugin makes no sense, cause reqs
only contains one container
// podReqGPU = uint(0)
for _, req := range reqs.ContainerRequests {
podReqGPU += uint(len(req.DevicesIDs))
}
Thus the following equation may never be satisfied for this case
if getGPUMemoryFromPodResource(pod) == podReqGPU {
@cheyang WDYT
Thanks, @YuxiJin-tobeyjin @monstercy , I think you are right. It should be handled. I will take a look at this later. If you have solutions, your contributions are welcome.
Anything new here? It seems that the issue is still present.