text-generation-inference icon indicating copy to clipboard operation
text-generation-inference copied to clipboard

Question: How to estimate memory requirements for a certain batch size/

Open vaishakkrishna opened this issue 2 years ago • 2 comments

I was just wondering how the GPU memory requirements vary depending on model size/batch size of request/max tokens. In doing some experiments where I needed the server to keep running for a long time, I found that it often ran out of memory and shut down - is there a way to estimate the memory footprint based on these variables?

vaishakkrishna avatar Jun 29 '23 15:06 vaishakkrishna

Unfortunately not at the moment. https://github.com/huggingface/text-generation-inference/issues/478 might help memory.

Other than that --max-total-batch-tokens is really the variable you need to set to control the amount of memory your going to need.

text-generation-launcher --help for further information and other control variables. Other variables should help too.

Narsil avatar Jun 30 '23 07:06 Narsil

You can use the benchmarking tool to make sure that you don't OOM at a given setting and then use these settings at maximum values in the launcher.

OlivierDehaene avatar Jun 30 '23 08:06 OlivierDehaene

Thank you all, I'll give these approaches a shot.

vaishakkrishna avatar Jul 03 '23 01:07 vaishakkrishna