coldzerofear

Results 7 issues of coldzerofear

The current vgpu node lock plugin is not actually enabled, It needs to be enabled to prevent parallel device allocation errors

ok-to-test
size/S
needs-rebase

There is an NPE issue with the device share plugin, #3241 Not really solving the problem unit testing ```golang func TestDevices(t *testing.T) { others := make(map[string]interface{}) nodeDevices := vgpu.DecodeNodeDevices("k8s01", "GPU-c496852d-f5df-316c-e2d5-86f0b322ec4c,20,30720,100,NVIDIA-NVIDIA...

ok-to-test
size/S
needs-rebase

### Bug Description: Unable to open terminal, accessing URL will get stuck ### Steps to Reproduce: my package.json ```json { "private": true, "dependencies": { "@theia/callhierarchy": "latest", "@theia/file-search": "latest", "@theia/git": "latest",...

There may be some potential null pointer panic during certain stages of the device sharing plugin lifecycle,We need to avoid as much as possible, reduce unnecessary null judgments, and improve...

size/M
needs-rebase

在使用LLaMa-Facorty容器化部署时,通过指定`CUDA_VISIBLE_DEVICES`环境变量选择在哪些GPU上运行,此时如果容器分配了2张卡,而`CUDA_VISIBLE_DEVICES`人为指定了非0卡,那么通过nvidia-smi命令查询显存使用,显存用量永远加在0号卡上,实际上应该在1号卡上。 ```txt root@chenweiyi-ed43f-0:/mnt/chenweiyi/LLaMA-Factory# nvidia-smi Fri May 24 17:59:35 2024 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile...

The node device score is currently not usable

size/M

**What type of PR is this?** **What this PR does / why we need it**: **Which issue(s) this PR fixes**: Fixes # **Special notes for your reviewer**: **Does this PR...