Yihua Cheng
Yihua Cheng
@XinyuJiangCMU Hey, thanks for your interest! Let me assign it to you. Looking forward to your PR!
I will take a look at this!
@maobaolong I think there is another ongoing effort for CPU offloading: #19854
@chenqianfzh @rainj-me Just curious, how much overhead will it introduce if we do not save KV cache but let decoding instance to decode 1 token
@hickeyma Hey Martin, I thought this PR is not needed since it will not be used with the latest vLLM anymore. @chenqianfzh @rainj-me Please let us know if we can...
@wangxiaoyang-dev Good catch! This is a bug. Feel free to create a PR to fix this.
I think we can just comment out that "return".