Weihang Wang
Weihang Wang
按照博主的接口去请求,最后得到的obj_resp好像改了,麻烦博主看一下!
thanks for your work! It is very valuable! I would like to know how you got your conclusion about token routing, since input is affected by attention and rope, it...
Which article proposed In-batch debiased cross-entropy loss? Can you provide relevant literature?
My vllm version is 0.11.0. I deployed it according to the official recommended command: ``` vllm serve Qwen/Qwen3-VL-235B-A22B-Instruct \ --tensor-parallel-size 8 \ --max-model-len 128000 \ --async-scheduling \ --enable-expert-parallel ``` I...