Thomas Atta-Fosu
Thomas Atta-Fosu
After the change to avoid using memmap when loading the labels #2081, we're encountering pickling errors via numpy. We're unable to load the labels with `numpy.load`. The previous loading via...
For 405B the sampling parameter config sets the [max output tokens](https://github.com/mlcommons/inference/blob/master/language/llama3.1-405b/SUT_VLLM.py#L75) to be 20k. However, given the reference output distribution with max output length of 1.7k, I don't think we...
@arjunsuresh @pgmpablo157321 While giving a run of the singlestream performance mode, I noticed that the number of samples being run is much higher than the 5k that was designated by...