freelizhun issues

Results 4 issues of


                                            freelizhun

KEP: Error Pods Rescheduling

#### What type of PR is this? /kind feature #### What this PR does / why we need it: Propose a scheduler plugin to rescheduling pods which encountered some erros...

kind/feature

ok-to-test

size/L

cncf-cla: yes

release-note-none

lifecycle/stale

A new scheduler plugin for rescheduling scheduled pods which encountered some errors to other nodes

### Area - [X] Scheduler - [ ] Controller - [ ] Helm Chart - [ ] Documents ### Other components _No response_ ### What happened? The current kubernetes scheduler...

kind/feature

lifecycle/rotten

* 某节点有2个GPU，一个GPU内存10G，另一个GPU内存20G，gpushare-device-plugin虚拟出来另一个内存20G的gpu的device id可能只有10个，即gpu-mem只有10G，总共的gpu-mem只有20G * gpushare-device-plugin allocate操作中，kubelet 永远只会发送一个container请求的device ids，而gpushare-device-plugin通过获取pod中所有container的device ids来做比对，可能会存在找不到该pod的情况，而导致container启动失败，如pod中存在2个container * pod.annotations: ALIYUN_COM_GPU_MEM_IDX: 0中永远只能分配一个GPU id，如果某个pod的gpu-mem: 12超过了单个GPU 10G，则直接无法调度 * gpushare-scheduler-extender采用的是在default-scheduler中以http的形式调用，对调度器性能会产生不利影响 * 虽然pod中可以定义gpu-mem: 2参数，但实际上无法在gpu层面对pod使用多少内存做出限制，只是逻辑上限制了在gpu上可以运行的pod数量，跟nvidia k8s-device-plugin的[time-slicing](https://github.com/NVIDIA/k8s-device-plugin/blob/v0.14.2/README.md#shared-access-to-gpus-with-cuda-time-slicing)类似

Can't download the papers

Can you send me the papers about voda to [email protected]? I can't download from this [link](https://hdl.handle.net/11296/hj4rzp). Thank you very much!

freelizhun

KEP: Error Pods Rescheduling

A new scheduler plugin for rescheduling scheduled pods which encountered some errors to other nodes

这个项目目前在使用过程中存在的问题

Can't download the papers