Yihua Cheng comments

Results 77 comments of


                                            Yihua Cheng

[Observability] Integrate LMCache observability to vLLM's KV connector metrics

@XinyuJiangCMU Hey, thanks for your interest! Let me assign it to you. Looking forward to your PR!

[Observability] Integrate LMCache observability to vLLM's KV connector metrics

I will take a look at this!

[RFC]: Offload KV cache to CPU in V1

@maobaolong I think there is another ongoing effort for CPU offloading: #19854

store/retrieve hidden states in PD Disagg

@chenqianfzh @rainj-me Just curious, how much overhead will it introduce if we do not save KV cache but let decoding instance to decode 1 token

store/retrieve hidden states in PD Disagg

@hickeyma Hey Martin, I thought this PR is not needed since it will not be used with the latest vLLM anymore. @chenqianfzh @rainj-me Please let us know if we can...

[BUG] LMCache WARNING: In connector.start_load_kv, but the attn_metadata is None

@wangxiaoyang-dev Good catch! This is a bug. Feel free to create a PR to fix this.

[BUG] LMCache WARNING: In connector.start_load_kv, but the attn_metadata is None

I think we can just comment out that "return".