profiler
profiler copied to clipboard
profiler tool underestimate memory really used underneath by GPU (NVIDIA)
Hello I successfully ran the profiler tool on ma classification model to profile the maximum memory usage. Because I want to use different CNN on a same GPU. But I'm really baffled by the results of the profiler. Let me explain
I have a NVIDIA RTX3090 with 24GB memory so for my small CNN I set 512 memory limit in my code before all use with this code :
tf.config.set_logical_device_configuration(gpus[0],[tf.config.LogicalDeviceConfiguration(memory_limit=512)])
It seems to work because of the tensorflow logs
2022-01-19 16:24:13.615890: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with **512 MB memory:** -> device: 0, name: GeForce RTX 3090, pci bus id: 0000:2d:00.0, compute capability: 8.6
Nvidia-smi shows that GPU use 419MiB are used
Then I start a batch to make the inference on the classification model with batch size = 1
and tensorboard shows that the model use about 100MiB
so theoretically I could have set a small memory limit (under 512) but .. here is the real use of the memory given by Nvidia-smi is 1869MiB !
Finally if I want a tool to know how much is the real memory consumption of a model, how to use the tensor board profiler ? TensorBoard resuIt is useless actually ?
Ok I've created a notebook that you can download and execute ( but not on colab because you need to have exclusiv acces on the GPU ) https://github.com/fitoule/tensorflow_gpu_memory-/blob/main/DemoMemoryIssue.ipynb
I made further investigations. Actually the command line works but documentation is not enough clear. on my test when I set memory_limit=200. A) When I Call import tensorflow => NVIDIA memory allocated is 423MiB B) When I Call the code with memory limit => NVIDIA memory allocated is 423+200=623MiB C) When a first inference is called then TensorFlow add a C part memory 938MiB (+423+200) Total = 1561 MiB
So I understand that A+C is a constant that is needed by TensorFlow and the memory_limit affects only the B part. I tested on many different model. A+B depends on the driver or GPU HW.
So now it's clear for me. But finally documentation could mention this because see for a low model about 100MiB I need 1.5GB ram, it 's confusing.