q yao

Results 318 comments of q yao

> > 还有, distill 的 r1 turbomind 应该都支持的,awq 的性能也是 turbomind 更好,如果追求性能更推荐用 turbomind。 > > turbomind 有在用。 性能确实比最新版vllm快了接近70%。非常nice 另外有直接输入对话列表的方式吗?简单修改下messages=[{"role": "user", "content": prompt}], 这一句 换成完整的有系统提示词的对话消息messages 是不是就可以? 因为如果使用api 本地并不知道使用了什么对话模板。本地不能获得服务器模型的tokenizer。 并且openai api 也改了。独立出来一个batch推理的形式了。这种方式性能一样吗? >...

Gcc 4.8.5 is not compatible with your triton environment. We run a simple add kernel as the environment check before launching our engine. https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/pytorch/check_env/triton_custom_add.py Please make sure that `custom_add` can...

This Error comes from `custom_add` https://github.com/InternLM/lmdeploy/blob/4492df812363d112aed9151a3ff7e26a654d36ea/lmdeploy/pytorch/check_env/__init__.py#L77-L81 Please double check `custom_add` and make sure the code above can be used.

I still can not reproduce the error. Since there is a sgemm cublas error report, try replace https://github.com/InternLM/lmdeploy/blob/2e49fc33916dc4a9feb63d4cd57b6be862000f93/lmdeploy/pytorch/backends/default/rotary_embedding.py#L33-L34 with ``` freqs = (inv_freq_expanded.float() * position_ids_expanded.float()).transpose(1, 2) ```

I have asked an expert, the error might come from the vision model on the default stream. Which would corruption the capturing of language model in the other stream. I...

We would capture multiple graphs with different input sizes, and the input would be padded to the capture size before forward. It is safe to use dynamic batching.

https://github.com/grimoire/lmdeploy/tree/fix-vl-graphcapture I have set the capture mode to thread_local, which might fix the bug. > What is the specific capture strategy like? https://github.com/grimoire/lmdeploy/blob/e16c49170f1413f23c03cac2d3549ca7b7f711c4/lmdeploy/pytorch/backends/cuda/graph_runner.py#L133 The engine would generate graphs with token...

Are you using the main branch of my repo? I have create a draft PR https://github.com/InternLM/lmdeploy/pull/2560, Please try this.