valafon issues

Results 9 issues of


                                            valafon

Limiting GPU Resource Usage per Docker Container with MPS Daemon

I’ve been utilizing the MPS (Multi-Process Service) daemon to manage resource usage limits for processes using the CUDA_MPS_ACTIVE_THREAD_PERCENTAGE and CUDA_MPS_PINNED_DEVICE_MEM_LIMIT environment variables, and it’s been working well. However, I’ve encountered...

Pod ignores limits.

Hello! I have launched the gpu-manager daemon set on a node. Then, I started a pod on this node which requested tencent.com/vcuda-memory:2. As I understand from the README, 1 vcuda...

restore script can't recreate membership of deleted team

If we delete the team and recreate it by script, team members don't restore. As I see, membership is restored through the ID of a team. But when the team...

Plugin doesn't respect physicals cards.

I have such an example - there is a server with 8 GPU's, each separated by 5 vgpu's. So I have a total of 40 vgpu for free node. Then...

Limiting GPU Resource Usage per Docker Container with MPS Daemon

lifecycle/stale

Restart all allocations button on main job window

### Proposal Add a "Restart All Allocations" button to the main window of the job in the WEB UI. For an example, I'm attaching a screenshot of how I see...

type/enhancement

theme/ui

stage/needs-discussion

valafon

Limiting GPU Resource Usage per Docker Container with MPS Daemon

Pod ignores limits.

restore script can't recreate membership of deleted team

Plugin doesn't respect physicals cards.

Limiting GPU Resource Usage per Docker Container with MPS Daemon

Restart all allocations button on main job window

plugin does not evenly distribute the pods. 这个插件无法均匀分配Pod。

GPU cores scheduling / GPU核心调度

Conditional JobSpec Variables for Task Based on Node Architecture