DeepSpeed
DeepSpeed copied to clipboard
Retrieve CUDA available memory via `torch.cuda.mem_get_info()`
This PR refactors the available_memory() method for the CUDA accelerator to use free, total = torch.cuda.mem_get_info(). It also removes the hard dependency pynvml.
Related PR:
- #4508
The torch.cuda.mem_get_info() function was added two years ago (May 26th, 2021). We have already relied on torch.cuda.is_bf16_supported() without a torch version check in the next method below. The torch.cuda.is_bf16_supported() function was added on August 26th, 2021. So we can assume the torch.cuda.mem_get_info() function is always available for the torch version we support.
Rational
-
The official NVML Python binding package is
nvidia-ml-pyrather thanpynvmlon PyPI. See the documentation on https://pypi.org/project/pynvml:This is a wrapper around the NVML library. For information about the NVML library, see the NVML developer page http://developer.nvidia.com/nvidia-management-library-nvml
As of version 11.0.0, the NVML-wrappers used in
pynvmlare identical to those published through nvidia-ml-py. -
Having
pynvmlwill add an extra dependency. It will also break the users' Python environment if they havenvidia-ml-pyinstalled. Because bothpynvmlandnvidia-ml-pyprovide thepynvmlmodule. We can rely ontorch.cuda.mem_get_info()where no extra dependency will be added. -
Handling the
CUDA_VISIBLE_DEVICESenvironment variable is very complex. The variable can be a comma-separated list of integers or UUID strings. Currently, we only support integers. Thetorch.cuda.mem_get_info()directly calls the CUDA API which does not need index conversion between CUDA and NVML.
https://github.com/microsoft/DeepSpeed/blob/6d7b44a838548d2e1878439613e1fbc17ddcfaf0/accelerator/cuda_accelerator.py#L156-L169
$ nvidia-smi -L
GPU 0: NVIDIA GeForce RTX 3090 (UUID: GPU-3cd9eb06-03f4-3b39-2f7b-48ee826b0a26)
GPU 1: NVIDIA GeForce RTX 3090 (UUID: GPU-611f484b-7a5a-f1ae-5aac-64d2ddad1ab6)
GPU 2: NVIDIA GeForce RTX 3090 (UUID: GPU-ba171e16-8df7-e1c4-5468-2ee35e18d1f0)
GPU 3: NVIDIA GeForce RTX 3090 (UUID: GPU-66bd9aec-436e-24eb-91e8-d31d6370d8f0)
GPU 4: NVIDIA GeForce RTX 3090 (UUID: GPU-9cc6b251-34a2-db9d-4ca0-7532f951aad2)
GPU 5: NVIDIA GeForce RTX 3090 (UUID: GPU-a6c609c1-078d-e47e-b418-8008e61a8cf6)
GPU 6: NVIDIA GeForce RTX 3090 (UUID: GPU-be37798a-62fb-ebee-90d2-01b018d81c6d)
GPU 7: NVIDIA GeForce RTX 3090 (UUID: GPU-8b2e78db-cff8-bb89-d9fd-64f1633df658)
$ export CUDA_VISIBLE_DEVICES="GPU-ba171e16,GPU-611f484b,GPU-3cd9eb06"
$ ipython
In [1]: import torch
In [2]: torch.cuda.memory_allocated(0)
Out[2]: 0
In [3]: torch.cuda.get_device_properties(0).total_memory
Out[3]: 25447170048
In [4]: torch.cuda.mem_get_info(0)
Out[4]: (510328832, 25447170048)
In [5]: from nvitop import CudaDevice
In [6]: cuda0 = CudaDevice(0)
...: cuda0
Out[6]: CudaDevice(cuda_index=0, nvml_index=2, name="NVIDIA GeForce RTX 3090", total_memory=24.00GiB)
In [7]: cuda0.memory_free()
Out[7]: 510328832
In [8]: cuda0.memory_used()
Out[8]: 24936841216
In [9]: cuda0.memory_total()
Out[9]: 25769803776
Hi @XuehaiPan - thank you for the contribution. If I recall correctly, we had to use pynvml because we were getting inaccurate memory information from torch in some scenarios. @jeffra may be able to comment more on this.
Either way, I will try out this branch and see if that is still the case. In particular, this code is necessary for FastGen and DeepSpeed-MII.
Hi @XuehaiPan - thank you for the contribution. If I recall correctly, we had to use
pynvmlbecause we were getting inaccurate memory information fromtorchin some scenarios. @jeffra may be able to comment more on this.Either way, I will try out this branch and see if that is still the case. In particular, this code is necessary for FastGen and DeepSpeed-MII.
If we aren't able to switch over, would it at least make sense to move to the nvidia-ml-py package as it is more regularly updated and at least matches the cuda version?