nos
nos copied to clipboard
Cannot use entire gpu memory
Hi,
I have an A100-PCIE-40GB gpu and I an trying to use nos mps dynamic partitioning. The issue is that Is seems to have some issues with total capacity calculation. for example I am trying to run 2 pods that require resource : nvidia.com/gpu-20gb: 1 and one of them always stays in pending while I am able to schedule 1 pod of nvidia.com/gpu-20gb and another 2 pods requesting nvidia.com/gpu-10gb I faced this issue of not fully using the GPU memory in some more combinations.
Does someone have any idea? will be much appreciated