Anton Lokhmotov comments

Results 273 comments of


                                            Anton Lokhmotov

Enable TEST04 and TEST05 for SDXL

I'm a bit concerned about enabling these tests for Edge systems. Looking at [the v4.0 Edge results](https://mlcommons.org/benchmarks/inference-edge), the SingleStream latency ranged from 2 to 13 seconds per sample. LoadGen seems...

Enable TEST04 and TEST05 for SDXL

For Datacenter, the Offline throughput ranged from 1.18 QPS to 13.71 QPS. That's up to 75 minutes per single Performance run.

Enable TEST04 and TEST05 for SDXL

Despite us having not decided on this issue, the submission checker already complains about missing TEST04 and TEST05, when the main results and TEST01 are present. I've done a little...

Icon request: toy-brick (lego-block)

[V1][Metrics] Add model_load_time as a log for CUDA devices

> > > Provided as a log for CUDA devices > > > > > > Uh? How can an auto-scaler use a log message to "determine the pod autoscaling...

[Small LLM] Max tokens fixed at 128?

> The gen_len here is not the tokens but the characters I believe. Llama2-70b computes the [`gen_tok_len`](https://github.com/mlcommons/inference/blob/master/language/llama2-70b/evaluate-accuracy.py#L107) which is then used to compute the tokens per sample Thanks @attafosu. For...