Yihua Cheng comments

Results 77 comments of


                                            Yihua Cheng

Whether multiple instances of prefill and decode are supported with vLLM?

@soulseen Thanks for you interest! Right now we support Xp1d setup. Please take a look at #528 We will update the documentation soon

Whether multiple instances of prefill and decode are supported with vLLM?

@soulseen Just a quick update, the documentation for xp1d is online now: https://docs.lmcache.ai/disaggregated_prefill/nixl/xpyd.html (it's named XpYd, but only 1 decoder instance is supported for now)

Whether multiple instances of prefill and decode are supported with vLLM?

@soulseen Yeah, it's compatible

Whether multiple instances of prefill and decode are supported with vLLM?

@soulseen Seems like there are some problems with the underlying UCX connection (which is used by NIXL) ``` [2025-06-13 08:28:11,553] LMCache INFO: Storing KV cache for 5 out of 5...

Whether multiple instances of prefill and decode are supported with vLLM?

@soulseen Is there any error log in terminal 2?

Whether multiple instances of prefill and decode are supported with vLLM?

@AsicDyc Hey, we don't use NIXL when doing CPU offloading. The high level difference is that TP=2 can have more GPU memory for KV cache than 2x TP=1, which means...

Whether multiple instances of prefill and decode are supported with vLLM?

@AsicDyc Right now we only have NIXL for pd disaggregation. We do support 2x TP=1, but we require the prefiller and the decoder having the same TP.

[Core] NIXL integration follow ups

@lengrongfu Hey, we don't have this right now. What's the use case for using etcd?

[Core] NIXL integration follow ups

@lengrongfu Right now we use zmq to directly exchange the nixl agent information between nixl agents.

[Bug]: AssertionError, assert prefill_metadata.context_chunk_seq_tot is not None

mark (thanks for the fix @DefTruth)