Teng Ma
Teng Ma
Please double-check this PR. CC. @ykwd @YiXR
@alogfans Could you give some suggestions?
> ### Changes proposed > **Motivation** > > As mentioned in [How did SGLang HiCache with Mooncake Backend calculate cache hit ratio](https://github.com/sgl-project/sglang/discussions/11672),Mooncake already has metrics to record some performance information,...
Maybe we should support building mooncake-based nixl without etcd? @alogfans
Good idea. LGTM. Looking forward to your PR! Local caching can improve performance, and we should consider making this feature optional for the generic client.
Another ideal, it seems we should support two kinds of clients. One is a standalone client without master. The other is distributed deployment with many clients and a master.
> In this version, the H100 machine does not have RDMA. Is it feasible to use NVLINK for transmission between nodes within a single machine? [@alogfans](https://github.com/alogfans) Coming soon....
> [@stmatengss](https://github.com/stmatengss) I will fix this bug because I have the environment Thx!
Transfer Engine does not perceive the upper-layer LLM request, so you should measure per-request network latency on the inference side. However, the latency can be recorded for each put/get operations...
> > Transfer Engine does not perceive the upper-layer LLM request, so you should measure per-request network latency on the inference side. However, the latency can be recorded for each...