gpu-operator icon indicating copy to clipboard operation
gpu-operator copied to clipboard

VCS licenses are acquired per-cluster rather than per-gpu

Open kralicky opened this issue 4 years ago • 5 comments

1. Issue or feature description

I have created four clusters and installed GPU operator into all 4. Each cluster contains one node which has been given 1 of 8 available VGPUs from the host, split between two GPUs providing 4 VGPUs each (everything is done on one machine with VMs). nvidia-gridd leases four licenses from the NLS, but it should only lease two.

nvidia-smi output on the host:

❯ nvidia-smi vgpu
Fri Oct  8 18:41:01 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63                 Driver Version: 470.63                    |
|---------------------------------+------------------------------+------------+
| GPU  Name                       | Bus-Id                       | GPU-Util   |
|      vGPU ID     Name           | VM ID     VM Name            | vGPU-Util  |
|=================================+==============================+============|
|   0  Tesla T4                   | 00000000:85:00.0             |   0%       |
|      3251649543  GRID T4-4C     | 0607...  instance-00000273   |      0%    |
+---------------------------------+------------------------------+------------+
|   1  Tesla T4                   | 00000000:C1:00.0             |   0%       |
|      3251635394  GRID T4-4C     | aa32...  instance-00000264   |      0%    |
|      3251643331  GRID T4-4C     | d5b3...  instance-0000026e   |      0%    |
|      3251652536  GRID T4-4C     | 6107...  instance-00000275   |      0%    |
+---------------------------------+------------------------------+------------+

image

kralicky avatar Oct 08 '21 18:10 kralicky

Hi @kralicky -- this is the expected behavior. You need a license per VM. It looks like you have 4 VMs, and so 4 licenses should be leased.

cdesiniotis avatar Oct 08 '21 23:10 cdesiniotis

The official documentation says the licenses are per-GPU: image

kralicky avatar Oct 11 '21 17:10 kralicky

@kralicky Please refer to this documentation: https://docs.nvidia.com/grid/13.0/grid-licensing-user-guide/index.html

For C-series NVIDIA vGPU deployments, one license per vGPU assigned to a VM is enforced through software. This license is valid for up to eight vGPU instances on a single GPU or for the assignment to a VM of one vGPU that is assigned all the physical GPU's frame buffer. When multiple C-series vGPUs are assigned to a single VM, one license for each vGPU assigned to the VM is required. One license is enforced through software. The remaining licenses are enforced through the EULA

shivamerla avatar Oct 12 '21 14:10 shivamerla

This doesn't make sense. Why license individual VGPUs instead of the single physical GPU? This is the use case for VGPUs - please consider changing this.

kralicky avatar Oct 12 '21 20:10 kralicky

@kralicky we will share the feedback with appropriate teams internally, but if you can open a support case to clarify this, that would be great.

shivamerla avatar Oct 13 '21 18:10 shivamerla