Better support from Loadgen to get optimal server QPS for the submitters
Options:
- [ ] Testing LoadGen's find peak performance mode
- [ ] Log additional advice related to possible increase of QPS
Pablo: The peak perf mode seems to add extra run time. Is that acceptable?
Arjun: Even if it takes 5-10x longer but the submitter can get peak performance without manual exploration, that would be useful.
Pablo: How do we want to solve this issue?
Arjun: Short documentation on how submitters can use this mode would be useful.
Anton: How is this implemented? Can set a target QPS, and if it passes, LoadGen may decide to raise it a bit. Some kind of search involved. Is there a range you set or steps? Just setting on a single parameter may be somewhat useful, but server tuning has multivariate impacts.
Pablo: As I understand it, only tunes for QPS. Need to see exactly how it's search. Can include it in the documentation.
I just passed the flag, and it needs a starting QPS.
Arjun: It does a binary search, so it needs an upper limit, too, correct?
Pablo: Not sure how it's calculated, but I didn't need another limit. I think one of the conditions is the starting QPS needs to pass the latency test. Might be using that one as an upper limit and zero, I guess? Not quite sure.
Anton: Suspect this was developed in the early days of LoadGen, so I won't know about LLM constraints. Will need to watch. Probably out of scope to develop another solution.
Arjun: We have seen submitters who only tune the target QPS manually. So this mode could have been useful for some submitters.
Anton: I'm somewhat skeptical. Remember, increasing the run time can improve perf. This is a second variable.
Arjun: Those run time opts are relatively small. This is not targeted to solve that problem. This is for submitters who failed to set the correct server QPS.
Pablo to follow up with documentation.