Mooncake icon indicating copy to clipboard operation
Mooncake copied to clipboard

[Performance]: Is there any way to measure network transmission latency for each llm request with mooncake or transfer engine?

Open BilyZ98 opened this issue 1 month ago • 5 comments

Describe your performance question

Just like what is mentioned in the title above.

Before submitting a new issue...

  • [ ] Make sure you already searched for relevant issues and read the documentation

BilyZ98 avatar Nov 21 '25 13:11 BilyZ98

Transfer Engine does not perceive the upper-layer LLM request, so you should measure per-request network latency on the inference side. However, the latency can be recorded for each put/get operations for the mooncake store. I think you can use these metrics for per-request profiling.

stmatengss avatar Nov 23 '25 07:11 stmatengss

Transfer Engine does not perceive the upper-layer LLM request, so you should measure per-request network latency on the inference side. However, the latency can be recorded for each put/get operations for the mooncake store. I think you can use these metrics for per-request profiling.

Throughput is reported in the TE metrics reporting thread now. Perhaps we could also report the completion time for each transfer batch?

staryxchen avatar Nov 24 '25 03:11 staryxchen

Transfer Engine does not perceive the upper-layer LLM request, so you should measure per-request network latency on the inference side. However, the latency can be recorded for each put/get operations for the mooncake store. I think you can use these metrics for per-request profiling.

Throughput is reported in the TE metrics reporting thread now. Perhaps we could also report the completion time for each transfer batch?

It works, and we can implement a histogram-based latency monitor to reduce overhead.

stmatengss avatar Nov 24 '25 12:11 stmatengss

Transfer Engine does not perceive the upper-layer LLM request, so you should measure per-request network latency on the inference side. However, the latency can be recorded for each put/get operations for the mooncake store. I think you can use these metrics for per-request profiling.

Throughput is reported in the TE metrics reporting thread now. Perhaps we could also report the completion time for each transfer batch?

It works, and we can implement a histogram-based latency monitor to reduce overhead.

Good suggestion. I'll implement it later.

staryxchen avatar Nov 24 '25 12:11 staryxchen

@stmatengss I have implemented the functionality to report task completion delay distributions in the PR #1130 , PTAL

staryxchen avatar Nov 27 '25 13:11 staryxchen