accelerate Model Memory Calculator For Different Input Token

Hi, I am curious about how to calculate the GPU memory requirement for LLM with different Input Token. From Qwen2's offical test, we found that the same model will require different GPU memory with different input token and the differences are huge. (For example, Qwen2-72B-Instruct GPTQ-Int4 model, 41.80GB GPU Memory for 1 Input Length, 47.90GB GPU Memory for 6144 Input Length, 57.79GB GPU Memory for 14336 Input Length, 107.94GB GPU Memory for 30720 Input Length,). Therefore, I am wondering whether there are some methods to calculate GPU memory requirement for one specific LLM with different Input Token.

Jul 04 '24 01:07 ZorkJ

The model memory calculator does not take into account the amount of memory for activations, only for loading the model, the gradients, and the optimizer states. Calculating the size of activations is very difficult, as it depends on many factors: sequence length (as you mentioned), batch size, dtypes, activation checkpointing, etc. The most difficult part is that each model architecture will have different requirements, depending on implementation details, which makes it very hard to calculate this value.

Jul 04 '24 11:07 BenjaminBossan

I see your point. Thanks for your comment.

Jul 05 '24 02:07 ZorkJ

I found this script by EleutherAI which might help you:

https://github.com/EleutherAI/cookbook/blob/main/calc/calc_transformer_mem.py

I haven't tested it yet but it's very detailed, so hopefully it helps.

Jul 05 '24 10:07 BenjaminBossan

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Aug 03 '24 15:08 github-actions[bot]