freelizhun

Results 4 issues of freelizhun

#### What type of PR is this? /kind feature #### What this PR does / why we need it: Propose a scheduler plugin to rescheduling pods which encountered some erros...

kind/feature
ok-to-test
size/L
cncf-cla: yes
release-note-none
lifecycle/stale

### Area - [X] Scheduler - [ ] Controller - [ ] Helm Chart - [ ] Documents ### Other components _No response_ ### What happened? The current kubernetes scheduler...

kind/feature
lifecycle/rotten

* 某节点有2个GPU,一个GPU内存10G,另一个GPU内存20G,gpushare-device-plugin虚拟出来另一个内存20G的gpu的device id可能只有10个,即gpu-mem只有10G,总共的gpu-mem只有20G * gpushare-device-plugin allocate操作中,kubelet 永远只会发送一个container请求的device ids,而gpushare-device-plugin通过获取pod中所有container的device ids来做比对,可能会存在找不到该pod的情况,而导致container启动失败,如pod中存在2个container * pod.annotations: ALIYUN_COM_GPU_MEM_IDX: 0中永远只能分配一个GPU id,如果某个pod的gpu-mem: 12超过了单个GPU 10G,则直接无法调度 * gpushare-scheduler-extender采用的是在default-scheduler中以http的形式调用,对调度器性能会产生不利影响 * 虽然pod中可以定义gpu-mem: 2参数,但实际上无法在gpu层面对pod使用多少内存做出限制,只是逻辑上限制了在gpu上可以运行的pod数量,跟nvidia k8s-device-plugin的[time-slicing](https://github.com/NVIDIA/k8s-device-plugin/blob/v0.14.2/README.md#shared-access-to-gpus-with-cuda-time-slicing)类似

Can you send me the papers about voda to [email protected]? I can't download from this [link](https://hdl.handle.net/11296/hj4rzp). Thank you very much!