InternVL [Feature] 请问 InternVL2-Llama3-76B的训练和推理大概需要多少显存？

Motivation

我这边报内存不够，应该是显存不够的意思吧。目前测试的是推理。

Related resources

显存不够

Additional context

No response

Aug 08 '24 01:08 lckj2009

您好，请问您使用的是什么显卡呢？

如果是使用BF16或者FP16的精度，1B需要2G显存，所以76B总共需要152G显存，需要2-3张A100 80G的显卡；如果使用AWQ INT4，1B需要0.5G显存，76B总共需要38G显存，1张A100 80G就可以。

Aug 08 '24 10:08 czczup

您好，请问您使用的是什么显卡呢？

如果是使用BF16或者FP16的精度，1B需要2G显存，所以76B总共需要152G显存，需要2-3张A100 80G的显卡；如果使用AWQ INT4，1B需要0.5G显存，76B总共需要38G显存，1张A100 80G就可以。

是用的是HF上面的官方 [model-00001-of-00032.safetensors]，没转换，估计要最高显存吧。

Aug 09 '24 01:08 lckj2009

我这边是A100 40G显存。即使用 lmdeploy chat /root/InternVL/ --cache-max-entry-count 0.01 量化。也是 RuntimeError: [TM][ERROR] CUDA runtime error: out of memory /lmdeploy/src/turbomind/utils/memory_utils.cu:32 错误

Aug 09 '24 03:08 lckj2009

我这边是A100 40G显存。即使用 lmdeploy chat /root/InternVL/ --cache-max-entry-count 0.01 量化。也是 RuntimeError: [TM][ERROR] CUDA runtime error: out of memory /lmdeploy/src/turbomind/utils/memory_utils.cu:32 错误

您好，40G的显卡1张应该也不太够，模型部分基本就占满了。模型运算也需要一些显存

Aug 09 '24 05:08 czczup

我这边是A100 40G显存。即使用 lmdeploy chat /root/InternVL/ --cache-max-entry-count 0.01 量化。也是 RuntimeError: [TM][ERROR] CUDA runtime error: out of memory /lmdeploy/src/turbomind/utils/memory_utils.cu:32 错误

您好，40G的显卡1张应该也不太够，模型部分基本就占满了。模型运算也需要一些显存

谢谢，issues/480 那个问题中，我无法选择一个模型，那个模型和参数下拉菜单不可选。是不是跟这个有关系？

Aug 12 '24 02:08 lckj2009

是的

Nov 23 '24 15:11 czczup