Mooncake [Performance]: Is there any way to measure network transmission latency for each llm request with mooncake or transfer engine?

Describe your performance question

Just like what is mentioned in the title above.

Before submitting a new issue...

[ ] Make sure you already searched for relevant issues and read the documentation

Nov 21 '25 13:11 BilyZ98

Transfer Engine does not perceive the upper-layer LLM request, so you should measure per-request network latency on the inference side. However, the latency can be recorded for each put/get operations for the mooncake store. I think you can use these metrics for per-request profiling.

Nov 23 '25 07:11 stmatengss

Transfer Engine does not perceive the upper-layer LLM request, so you should measure per-request network latency on the inference side. However, the latency can be recorded for each put/get operations for the mooncake store. I think you can use these metrics for per-request profiling.

Throughput is reported in the TE metrics reporting thread now. Perhaps we could also report the completion time for each transfer batch?

Nov 24 '25 03:11 staryxchen

Transfer Engine does not perceive the upper-layer LLM request, so you should measure per-request network latency on the inference side. However, the latency can be recorded for each put/get operations for the mooncake store. I think you can use these metrics for per-request profiling.

Throughput is reported in the TE metrics reporting thread now. Perhaps we could also report the completion time for each transfer batch?

It works, and we can implement a histogram-based latency monitor to reduce overhead.

Nov 24 '25 12:11 stmatengss

Transfer Engine does not perceive the upper-layer LLM request, so you should measure per-request network latency on the inference side. However, the latency can be recorded for each put/get operations for the mooncake store. I think you can use these metrics for per-request profiling.

Throughput is reported in the TE metrics reporting thread now. Perhaps we could also report the completion time for each transfer batch?

It works, and we can implement a histogram-based latency monitor to reduce overhead.

Good suggestion. I'll implement it later.

Nov 24 '25 12:11 staryxchen

@stmatengss I have implemented the functionality to report task completion delay distributions in the PR #1130 , PTAL

Nov 27 '25 13:11 staryxchen