InternVL
InternVL copied to clipboard
[Bug] InternVL3_5-30B-A3B VIT low performance
Checklist
- [x] 1. I have searched related issues but cannot get the expected help.
- [ ] 2. The bug has not been fixed in the latest version.
- [ ] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
Describe the bug
I tested the throughput of InternVL3_5-30B-A3B and found that the model's TTFT performance was very poor. I changed the test method to use plain text input and found that TTFT performance improved significantly, suggesting that a serious performance bottleneck exists in the VIT stage.
Reproduction
vllm serve /data/models/InternVL3.5-30B-A3B
--trust-remote-code
--tensor-parallel-size 1
--gpu-memory-utilization 0.2
--max-model-len 16384
--max-num-seqs 16
--host 0.0.0.0
--port 8000
--served-model-name InternVL3.5-30B-A3B
Environment
InternVL3.5-30B-A3B
Error traceback