Lzhang-hub

Results 35 comments of Lzhang-hub

> I add `--use-distributed-optimizer`,get a new error. Env: 4*8=32 a100 gpu, tp2 pp8 > > ``` > [2024-06-03 08:11:01,842] [INFO] [ckpt_saver.py:892:commit_checkpoint] The number of ready shards is 26 != 32....

It it because first `` is in chat template, so model fisrt output token is not ``, open-webui can not got it. https://huggingface.co/Qwen/QwQ-32B/blob/main/tokenizer_config.json

> @Lzhang-hub Could we maybe, instead of the warning, check the version of flash attention installed and error out if the version number is too low? Also, please sign your...

@mickqian I use the latest version run qwen2.5-vl-7b model with command ``` python -m sglang.launch_server --model-path Qwen/Qwen2.5-VL-7B-Instruct --host 0.0.0.0 --port 8080 --chat-template qwen2-vl --chunked-prefill-size -1 --disable-radix-cache --mm-attention-backend fa3 --attention-backend fa3...

@endurehero ![image](https://github.com/user-attachments/assets/70b59279-9bf6-464a-b8cf-475ba3fd07d7) which tools are you used for get the time cost, thank you.

Greate job! If I want to participate in VLM, what can I do?

> Is it tested for MoE? Add acc and perf bench for `Qwen3-VL-235B-A22B-Instruct`

> Please add a unit test for this model dp. Done