Lzhang-hub
Lzhang-hub
> I add `--use-distributed-optimizer`,get a new error. Env: 4*8=32 a100 gpu, tp2 pp8 > > ``` > [2024-06-03 08:11:01,842] [INFO] [ckpt_saver.py:892:commit_checkpoint] The number of ready shards is 26 != 32....
It it because first `` is in chat template, so model fisrt output token is not ``, open-webui can not got it. https://huggingface.co/Qwen/QwQ-32B/blob/main/tokenizer_config.json
> @Lzhang-hub Could we maybe, instead of the warning, check the version of flash attention installed and error out if the version number is too low? Also, please sign your...
@mickqian I use the latest version run qwen2.5-vl-7b model with command ``` python -m sglang.launch_server --model-path Qwen/Qwen2.5-VL-7B-Instruct --host 0.0.0.0 --port 8080 --chat-template qwen2-vl --chunked-prefill-size -1 --disable-radix-cache --mm-attention-backend fa3 --attention-backend fa3...
@endurehero  which tools are you used for get the time cost, thank you.
Greate job! If I want to participate in VLM, what can I do?
> Is it tested for MoE? Add acc and perf bench for `Qwen3-VL-235B-A22B-Instruct`
> Please add a unit test for this model dp. Done