[Performance]: Is there any way to measure network transmission latency for each llm request with mooncake or transfer engine?
Describe your performance question
Just like what is mentioned in the title above.
Before submitting a new issue...
- [ ] Make sure you already searched for relevant issues and read the documentation
Transfer Engine does not perceive the upper-layer LLM request, so you should measure per-request network latency on the inference side. However, the latency can be recorded for each put/get operations for the mooncake store. I think you can use these metrics for per-request profiling.
Transfer Engine does not perceive the upper-layer LLM request, so you should measure per-request network latency on the inference side. However, the latency can be recorded for each put/get operations for the mooncake store. I think you can use these metrics for per-request profiling.
Throughput is reported in the TE metrics reporting thread now. Perhaps we could also report the completion time for each transfer batch?
Transfer Engine does not perceive the upper-layer LLM request, so you should measure per-request network latency on the inference side. However, the latency can be recorded for each put/get operations for the mooncake store. I think you can use these metrics for per-request profiling.
Throughput is reported in the TE metrics reporting thread now. Perhaps we could also report the completion time for each transfer batch?
It works, and we can implement a histogram-based latency monitor to reduce overhead.
Transfer Engine does not perceive the upper-layer LLM request, so you should measure per-request network latency on the inference side. However, the latency can be recorded for each put/get operations for the mooncake store. I think you can use these metrics for per-request profiling.
Throughput is reported in the TE metrics reporting thread now. Perhaps we could also report the completion time for each transfer batch?
It works, and we can implement a histogram-based latency monitor to reduce overhead.
Good suggestion. I'll implement it later.
@stmatengss I have implemented the functionality to report task completion delay distributions in the PR #1130 , PTAL