windreamer comments

Results 22 comments of


                                            windreamer

[Feature] turbomind后端是否会支持guided_decoding

> > After some dig-ups, I think I have enabled response_format for the OpenAI API Server in the last commit of the PR? Maybe you can have a try? >...

[Feature] turbomind后端是否会支持guided_decoding

> import json > from typing import List > > from openai import OpenAI > from pydantic import BaseModel > > > class StoryOutput(BaseModel): > title: str > characters: List[str]...

[Bug] AMD显卡运行出现错误

@sunskyx can you try this way? https://github.com/InternLM/lmdeploy/pull/3925#issuecomment-3252045525 Sadly we do not have a usable RoCM environment to test and solve it now. @Vivicai1005 do you have any idea on it?

[Bug] 使用LMdeploy在v100推理 Qwen2.5-VL-32B 过一段时间就会卡住

Can you kindly attach **the output of the following command** to help us debug? ``` lmdeploy check_env ```

[Bug] 使用LMdeploy在v100推理 Qwen2.5-VL-32B 过一段时间就会卡住

By default, TP in Turbomind uses NCCL for multi-GPU communication, and this may get stucked due to incorrect NCCL environment setup. You may think of the following checklist to help...

Request: pip wheels for FlashAttention-3

I have just written a simple project to weekly build flash_attn 3 wheels. For any one interested, you can visit https://github.com/windreamer/flash-attention3-wheels for more details. And you can also install via...

[Feature] Aarch64 Support

We do not have this kind of devices to verify, but you can use the similar way to build LMDeploy in Jetson. You need to use NVIDIA SBSA base image...

[Bug] SDAR模型推理

我的理解是为了性能考虑，用户最后拿到的流式输出是以 block 为单位的，也就是每次输出一个或者多个 block 的解码结果。而并非是你理解的每个block内部的 diffsion step 的结果都会实时返回给用户。具体的可能还是 @grimoire 更熟悉一些。

[Feature] Support LMCache

Can you elaborate it a bit more why the support for LMCache is nescessary? In my point of view, currently LMDeploy already has: - a built-in KV Cache management system...

[Feature] Support LMCache

> Does this offer the ability to offload layers to CPU and have the KV cache shared efficiently? When I tried last it didnt? In my opinion, the reason that...