Fix off‑by‑one when using --max_examples in language/bert/evaluate_v1.1.py

Open sbite0138 opened this issue 5 months ago • 2 comments

The script provides a --max_examples option to limit the number of evaluation samples. However, total is incremented and compared to max_examples before the per‑example score is recorded. If --max_examples is specified, the statistics for the final sample are skipped, so the reported metrics are based on N – 1 examples instead of N.

This PR moves the total += 1 and the if max_examples and max_examples == total: break logic after the score update. This lets the last sample contribute to the metrics, ensuring the results accurately reflect all N examples.

Jul 18 '25 13:07 sbite0138

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

Jul 18 '25 13:07 github-actions[bot]

recheck

Jul 18 '25 14:07 sbite0138

Fix off‑by‑one when using --max_examples in language/bert/evaluate_v1.1.py

Fix off‑by‑one when using --max_examples in language/bert/evaluate_v1.1.py