Monitoring actual GPU memory usage
Describe the problem the feature is intended to solve
I have several models loaded and not sure how can I know if Tensorflow still has some memory left. I can check using nvidia-smi how much memory is allocated by Tensorflow but I couldn't find a way to check loaded models usage.
Describe the solution
Tensorflow could provide some metrics for Prometheus about actual GPU memory usage by each loaded model.
Describe alternatives you've considered
None.
Additional context
I am not sure if this is actually a feature request or it can be done somehow at the moment.
@ctuluhu , Can you please check this link and let us know if it helps.
@rmothukuru Thank you for your response but I didn't find what I am looking for in provided link. What I need is a way to get the amount of free memory allocated by Tensorflow.
Issue eg: Tensorflow allocated 6GB memory, later I have loaded two models into Tensorflow memory, how can I know how much of this 6GB is used by loaded models and how much of this memory is free?
Hi there, we can easily export metrics that tell you host memory consumption on a per model basis but I think you're specifically looking for GPU's memory consumption/availability correct? This is not straight forward but we will discuss it internally to understand exactly how difficult it would be an provide an update. Let us know if you have any additional info that you think might be useful for us to know!
Hi, yes I am looking for a way to check GPU's memory availability. Such a feature would be great.
We also met this problem. TF-serving occupied all GPU memory when it started and there is no way to know how much memory really needed for a specific model. If we deploy too much models in a server instance, sometimes it will hang up and do not response , all connections to it will timeout. Thus, for multiple models, we need to do lot of load test to decide which can be deployed together in one instance and which need to be deployed in another.
@ctuluhu @troycheng @unclepeddy one way to mitigate the problem is to use environment flag TF_FORCE_GPU_ALLOW_GROWTH=true when you launch your model server. It'll grab minimum required GPU memory at startup and gradually increase the consumption as needed. Please let me know if it works.
Please also see this stackoverflow question about how to monitor memory usage using memory_stat ops and run_metadata.
Automatically closing due to lack of recent activity. Please update the issue when new information becomes available, and we will reopen the issue. Thanks!
Is there any update on this issue? In TF 2.2 I still don't see an easy way to measure actually and peak memory usage
why close
@ctuluhu @troycheng @unclepeddy one way to mitigate the problem is to use environment flag
TF_FORCE_GPU_ALLOW_GROWTH=truewhen you launch your model server. It'll grab minimum required GPU memory at startup and gradually increase the consumption as needed. Please let me know if it works.
I believe this can be considered as a basic solution to the problem. But the GPU memory usage cannot be fully separated according to the model loaded as part of the GPU memory usage are cost by stuff like CUDA context, which is shared among loaded models.
Meanwhile, it seems there should be a limit for each model's GPU memory growth, which should be related to the parameters of the model and max batch size set to TFServing.
Also, TF_FORCE_GPU_ALLOW_GROWTH=true should not affect the latency of TFServing for handling request after the first request (if the memory is allocated for the entire batch size). The GPU memory allocated before seems not deallocated if no further request received.
@gaocegege
Any workaround for this ? atleast using the metrics in prometheus ?
We still don't see an easy way to monitor GPU memory usage. Is there any progress?
It's been 2 years, 4 days and we still don't have any update on one of the most vital part.
Great!
Sorry for the late reply.
Could you try the memory profiling tool to see if it helps https://www.tensorflow.org/guide/profiler#memory_profile_tool? You could also take a look at https://www.tensorflow.org/api_docs/python/tf/config/experimental/get_memory_info, which provides the current and peak memory that TensorFlow is actually using. However, this might not work since the models are running online on c++ model servers.
@guanxinq I think people are more interested in something from the tensorflow/serving side!
Example use case could be monitoring the GPU usage while serving model in production. If it's available through some API endpoint, that information could be useful to scale the cluster or increase the backend workers models. nvidia-smi or the related implementation are there but something directly from the serving would definitely be more useful.