profiler icon indicating copy to clipboard operation
profiler copied to clipboard

profiler tool underestimate memory really used underneath by GPU (NVIDIA)

Open fitoule opened this issue 3 years ago • 2 comments

Hello I successfully ran the profiler tool on ma classification model to profile the maximum memory usage. Because I want to use different CNN on a same GPU. But I'm really baffled by the results of the profiler. Let me explain

I have a NVIDIA RTX3090 with 24GB memory so for my small CNN I set 512 memory limit in my code before all use with this code :
tf.config.set_logical_device_configuration(gpus[0],[tf.config.LogicalDeviceConfiguration(memory_limit=512)])

It seems to work because of the tensorflow logs 2022-01-19 16:24:13.615890: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with **512 MB memory:** -> device: 0, name: GeForce RTX 3090, pci bus id: 0000:2d:00.0, compute capability: 8.6

Nvidia-smi shows that GPU use 419MiB are used smi first

Then I start a batch to make the inference on the classification model with batch size = 1 and tensorboard shows that the model use about 100MiB tensorboard

so theoretically I could have set a small memory limit (under 512) but .. here is the real use of the memory given by Nvidia-smi is 1869MiB ! nvidiaamemoryused

Finally if I want a tool to know how much is the real memory consumption of a model, how to use the tensor board profiler ? TensorBoard resuIt is useless actually ?

fitoule avatar Jan 19 '22 16:01 fitoule

Ok I've created a notebook that you can download and execute ( but not on colab because you need to have exclusiv acces on the GPU ) https://github.com/fitoule/tensorflow_gpu_memory-/blob/main/DemoMemoryIssue.ipynb

fitoule avatar Jan 20 '22 15:01 fitoule

I made further investigations. Actually the command line works but documentation is not enough clear. on my test when I set memory_limit=200. A) When I Call import tensorflow => NVIDIA memory allocated is 423MiB B) When I Call the code with memory limit => NVIDIA memory allocated is 423+200=623MiB C) When a first inference is called then TensorFlow add a C part memory 938MiB (+423+200) Total = 1561 MiB

So I understand that A+C is a constant that is needed by TensorFlow and the memory_limit affects only the B part. I tested on many different model. A+B depends on the driver or GPU HW.

So now it's clear for me. But finally documentation could mention this because see for a low model about 100MiB I need 1.5GB ram, it 's confusing.

fitoule avatar Jan 21 '22 14:01 fitoule