Teng Ma comments

Results 64 comments of


                                            Teng Ma

[store] add pybind for get_replica_desc

Please double-check this PR. CC. @ykwd @YiXR

feat: add PCIe Relaxed Ordering (RO) support and RDMA traffic class (…

@alogfans Could you give some suggestions?

[RFC]: More KVCache metrics in both master/client side

> ### Changes proposed > **Motivation** > > As mentioned in [How did SGLang HiCache with Mooncake Backend calculate cache hit ratio](https://github.com/sgl-project/sglang/discussions/11672)，Mooncake already has metrics to record some performance information,...

[Usage]: How to compile/run nixl bench using mooncake TE as backend

Maybe we should support building mooncake-based nixl without etcd? @alogfans

[RFC]: Add Local Cache Mechanism for Mooncake Store Client

Good idea. LGTM. Looking forward to your PR! Local caching can improve performance, and we should consider making this feature optional for the generic client.

[RFC]: Add Local Cache Mechanism for Mooncake Store Client

Another ideal, it seems we should support two kinds of clients. One is a standalone client without master. The other is distributed deployment with many clients and a master.

[RoadMap] Mooncake Transfer Engine NEXT

> In this version, the H100 machine does not have RDMA. Is it feasible to use NVLINK for transmission between nodes within a single machine? [@alogfans](https://github.com/alogfans) Coming soon....

[Bug]: RDMA Device Misidentification in Container Environment

> [@stmatengss](https://github.com/stmatengss) I will fix this bug because I have the environment Thx!

[Performance]: Is there any way to measure network transmission latency for each llm request with mooncake or transfer engine?

Transfer Engine does not perceive the upper-layer LLM request, so you should measure per-request network latency on the inference side. However, the latency can be recorded for each put/get operations...

[Performance]: Is there any way to measure network transmission latency for each llm request with mooncake or transfer engine?

> > Transfer Engine does not perceive the upper-layer LLM request, so you should measure per-request network latency on the inference side. However, the latency can be recorded for each...