inference icon indicating copy to clipboard operation
inference copied to clipboard

Feedback Capture: Standardizing the request routing algorithm

Open hanyunfan opened this issue 9 months ago • 0 comments

Suggestion received as outlined below. I've opened an issue to gather feedback on interest in this idea. Please leave a comment if you'd like to see it included in a future MLPerf Inference release.

Submitter implemented request routing within the audited system based on input sequence length. While this approach is effective (compared to naive round-robin approach), it may be possible to improve efficiency by incorporating node load or system utilization as additional factors in routing decisions. Given the increasing complexity of modern, multinode inference systems, there is an opportunity for MLCommons to consider standardizing this aspect of the benchmarking environment. Establishing a standardized interface or set of guidelines for request routing would allow all participants to contribute and benchmark innovative algorithms in this area. This could foster joint innovation and more accurately reflect real-world deployment scenarios in MLPerf benchmarking. As multinode, high-throughput inference becomes the norm, evolving the benchmarking framework in this direction may prove beneficial for the community.

hanyunfan avatar Jun 20 '25 20:06 hanyunfan