Pablo Gonzalez
Pablo Gonzalez
Options: - [ ] Testing LoadGen's find peak performance mode - [ ] Log additional advice related to possible increase of QPS
How do we [incentivize](https://github.com/mlcommons/submissions_inference_v5.0/issues/65#issuecomment-2741866193) cooperative behavior and punish uncooperative behavior e.g.: - A peer review performed by the next meeting: +1 point? - A peer review not performed by the...
Post mortem task proposed by @nv-ananjappa
For example: if no quantization is done, it can just be mentioned in the document Post mortem task proposed by @arjunsuresh
Options - [ ] Send the size of the original dataset to the SUT - [ ] Verification of the order that queries were responded by loadgen. Potentially introduce a...
- Add `server_constant_gen` parameter to the test_settings - Add `server_constant_gen` parameter to the python API - Let `server_constant_gen` be able to be loaded from the config files (`mlperf.conf` and `user.conf`)...
The following issues appear when running the LLM reference implementation. Multiple GPUs issue: ``` (VllmWorkerProcess pid=1795) ERROR 12-03 18:49:03 multiproc_worker_utils.py:231] Exception in worker VllmWorkerProcess while processing method init_device: Cannot re-initialize...
The following issues appear when running the LLM reference implementation Dependencies in the docker container: ``` Collecting mistral-common>=1.4.4 (from mistral-common[opencv]>=1.4.4->vllm==0.6.3->-r requirements.txt (line 8)) Downloading mistral_common-1.5.0-py3-none-any.whl.metadata (4.6 kB) Downloading mistral_common-1.4.4-py3-none-any.whl.metadata (4.6...
#1670 Testing command (outside the inference repo): ``` python -m inference.tools.submission.submission_checker.main --input inference_results_v5.1 ```