mobile_app_open icon indicating copy to clipboard operation
mobile_app_open copied to clipboard

Update Loadgen version

Open anhappdev opened this issue 4 months ago • 8 comments

Currently we use the commit 238d035 from Feb 29, 2024.

We should update the Loadgen to latest version. https://github.com/mlcommons/inference/commits/master/loadgen

anhappdev avatar Oct 28 '25 05:10 anhappdev

Image


We currently use the mobile_update branch, which contains commits related to enforcing a maximum duration. This is not in the master branch. Therefore, we have 3 options:

  1. Merge master into the mobile_update branch and maintain our own branch for loadgen.
  2. Use the master branch but apply the commits as a patch.
  3. Open a PR in the inference repo to merge the changes into master.

@freedomtan, @Mostelk, @mohitmundhragithub, @farook-edev, I would like your advice on which path we should take. I personally prefer option 2 (apply a patch) to keep our modifications centralised in this mobile_app_open repo. Option 3 may be the cleanest option, but I'm not sure if we can do that (else we've already gone that way in the past).

anhappdev avatar Oct 28 '25 07:10 anhappdev

Looking at loadgen's documentation, they seem to require any changes to be upstreamed as a rule, does having our own branch satisfy this rule? If not, then I don't think we can go with either options 1 or 2.

The above issue aside, I personally agree that 3 is the cleanest, but if it proves too difficult/bureaucratic, then 1 (and optionally maintaining a fork rather than a branch) would be the next best thing IMO. That's because it keeps loadgen as a dependency that can simply be used by the app, which is cleaner than using a patch IMO.

farook-edev avatar Oct 28 '25 07:10 farook-edev

@anhappdev The current loadgen version we use already has token latencies, we should be able to complete the LLM android implementation without the need for an upgrade.

In the same vein, the API should be similar enough to where we'd be able to upgrade seamlessly without the need for code rewrites.

farook-edev avatar Oct 28 '25 16:10 farook-edev

IMHO, to maintain consistency with our previous approach (before using the mobile_update), we should update the mobile_update branch to a specific “stable” version, say v5.1, instead of the master branch.

freedomtan avatar Nov 11 '25 05:11 freedomtan

https://github.com/mlcommons/inference/tags

anhappdev avatar Nov 11 '25 06:11 anhappdev

Let's try to rebase the mobile_update to the master branch and send a PR. Then we can use future versions like v6.0. @anhappdev

freedomtan avatar Nov 11 '25 06:11 freedomtan

  • https://github.com/mlcommons/inference/pull/2394/

anhappdev avatar Dec 22 '25 02:12 anhappdev

I'm not sure where to report this, but there was a question some weeks back about whether or not LoadGen excludes TTFT from TPS counting. I found this line which confirms that it does in fact only include the duration after the first output token is generated (latency is the variable representing the time until first output token is generated). @freedomtan @Mostelk

farook-edev avatar Jan 06 '26 02:01 farook-edev