SpecForge icon indicating copy to clipboard operation
SpecForge copied to clipboard

[Feature] VLM model support tp

Open KerwinKai opened this issue 4 months ago • 3 comments

Motivation

support tp for qwen2.5 vl, the gpu memory is 78.62GB in tp1, 43.56GB in tp4.

Modifications

  • add qwen2_5_vl.py for target model

  • add QKVParallelLinear for linear.py, because Qwen2_5_VLVisionAttention class need it.

Related Issues

https://github.com/sgl-project/SpecForge/issues/166

Pedding todo

  • accuracy test

  • support tp8, because num_attention_heads in config.json can not be divide by 8.

Checklist

  • [ ] Format your code according to the Code Formatting with Pre-Commit.
  • [ ] Add unit tests as outlined in the Running Unit Tests.
  • [ ] Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
  • [ ] Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
  • [ ] For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
  • [ ] Please feel free to join our Slack channel at https://sgl-fru7574.slack.com/archives/C09784E3EN6 to discuss your PR.

KerwinKai avatar Sep 01 '25 12:09 KerwinKai

Clipboard_Screenshot_1757595669 The lm_head is column parallel, but you did not perform a gather operation here.

oswen avatar Sep 11 '25 13:09 oswen

@FrankLeeeee Hi, Could you help to review this PR about VL?

zyksir avatar Oct 20 '25 14:10 zyksir

@KerwinKai Is it working properly now?

ggg-s avatar Nov 04 '25 09:11 ggg-s