InternVL
InternVL copied to clipboard
OpenGVLab/InternVL3_5-241B-A28B fails to load on 16× A100 40GB GPUs (OOM issue)
I am trying to load InternVL3_5-241B-A28B for inference, but I always encounter an out-of-memory (OOM) error.
Here is the script I used:
lmdeploy serve api_server OpenGVLab/InternVL3_5-241B-A28B
--server-port 23333
--tp 16
--backend turbomind
--cache-max-entry-count 0.05
--session_len 512
--dtype bfloat16
--quant-policy 4
According to Table 18 of the InternVL3.5 paper, 8× A100 GPUs should be sufficient to run inference on InternVL3_5-241B-A28B.
I would like to understand whether there is an issue with my script that causes the OOM error.