backend.ai
backend.ai copied to clipboard
feat: Scanning GPU allocation map
Resolves https://github.com/lablup/giftbox/issues/638.
Implement API for querying GPU allocation map (GPU allocation states per GPU device).
The GPU allocation is calculated by reading the resource.txt file in the scratch directory per kernel and summing up the allocation information in KernelResourceSpec
.
Test
Tested using mock-accelerator.
Here is a simple example with which we can test this PR.
When I specify below two mock GPU devices in mock-accelerator.toml
, I have 2 fGPUs in total.
devices = [
{ mother_uuid = "c59395cd-ac91-4cd3-a1b0-3d2568aa2d01", model_name = "CUDA GPU", numa_node = 0, subproc_count = 108, memory_size = "2G", is_mig_device = false },
{ mother_uuid = "c59395cd-ac91-4cd3-a1b0-3d2568aa2d02", model_name = "CUDA GPU", numa_node = 1, subproc_count = 108, memory_size = "2G", is_mig_device = false },
]
And after creating session like below command,
❯ ./backend.ai session create \
-r cpu=1 -r mem=2g -r cuda.shares=0.2 \
cr.backend.ai/testing/ngc-pytorch:23.10-pytorch2.1-py310-cuda12.2
∙ Session ID e114540d-bd7e-4765-bb25-4b00a47feb51 is created and ready.
∙ This session provides the following app services: sshd, ttyd, jupyter, jupyterlab, vscode, tensorboard, mlflow-ui, nniboard
❯ ./backend.ai session create \
-r cpu=1 -r mem=2g -r cuda.shares=1.2 \
cr.backend.ai/testing/ngc-pytorch:23.10-pytorch2.1-py310-cuda12.2
∙ Session ID 91bf45c7-43f3-4c52-9e49-48d49bc897f7 is created and ready.
∙ This session provides the following app services: sshd, ttyd, jupyter, jupyterlab, vscode, tensorboard, mlflow-ui, nniboard
I can query the gpu_alloc_map
as json format using the following query statement.
query ($agent_id: String!) {
agent(agent_id: $agent_id) {
gpu_alloc_map
}
}
{
"data": {
"agent": {
"gpu_alloc_map": "{\"c59395cd-ac91-4cd3-a1b0-3d2568aa2d02\": \"0.80\", \"c59395cd-ac91-4cd3-a1b0-3d2568aa2d01\": \"0.60\"}"
}
}
}
And we can see two mock GPU devices have been allocated 0.6 and 0.8 fGPU respectively.
In the first request, 0.2 fGPU was allocated to the second GPU. Since there was no device available to allocate 1.2 fGPU in the second request, it can be seen that 0.6 fGPU was evenly distributed and allocated to both GPU devices.
Checklist: (if applicable)
- [x] Milestone metadata specifying the target backport version
- [x] Mention to the original issue
- [x] API server-client counterparts (e.g., manager API -> client SDK)