Thomas Atta-Fosu
Thomas Atta-Fosu
Hi @JustinInAI: At what step do you encounter these errors? Could you post a sample of the LNK2019 error?
@JustinInAI A snapshot/sample of the errors would be useful here. That said, I was able to reproduce a single LNK error (linking x64 ov_mlperf target to x86 mlperf_loadgen.lib). It appears...
@arjunsuresh Yes, that's correct.
The fix looks good to me
@ljk3210 For the Server scenario the **first-token-latency** is one of the constraints that have to be met. The other constraint is the **time-per-output-token**. Their statistics during the run will also...
As discussed in the WG today, it will be great if MLCommons can find a way to share the preprocessed dataset with submitters. @arjunsuresh @pgmpablo157321
The gen_len here is not the tokens but the characters I believe. Llama2-70b computes the [`gen_tok_len`](https://github.com/mlcommons/inference/blob/master/language/llama2-70b/evaluate-accuracy.py#L107) which is then used to compute the tokens per sample
@psyhtest Valid point on the osl distribution. iirc one of the reasons was that without finetuning, the 8B was quite verbose, which is evident from most of the generated outputs...
Hi @psyhtest One request that came up from the WG is whether you can generate some statistics of the generated output sequence lengths when the max output tokens is increased...
Thanks @arjunsuresh. After specifying the `target_latency` the counts are all 5k now. As this may not be obvious to first time users, should it be stated in the instructions for...