feat: improve eval summary output with compact layout and better UX
Summary
Redesigns the eval summary output to be more compact, readable, and user-friendly while preserving all information. Reduces output from ~26 lines to ~10 lines and fixes a critical bug where grading tokens weren't displayed in certain scenarios.
Before vs After
Before (26 lines)
✓ Eval complete
ID: eval-xyz-123
» Run promptfoo view to use the local web viewer
» Run promptfoo share to create a shareable URL
» This project needs your feedback. What's one thing we can improve? https://promptfoo.dev/feedback
Total Tokens: 2,927
Eval: 929 (929 prompt, 0 completion, 929 cached)
Grading: 1,998 (1,636 prompt, 362 completion)
Provider Breakdown:
openai:gpt-5: 496 (58 prompt, 438 completion, 384 reasoning)
openai:gpt-5-mini: 433 (58 prompt, 375 completion, 320 reasoning)
Pass Rate: 50.00%
Results: 4 passed, 4 failed, 0 errors
Duration: 9s (concurrency: 4)
After (10 lines)
✓ Eval complete (ID: eval-xyz-123)
» View results: promptfoo view
» Feedback: https://promptfoo.dev/feedback
Total Tokens: 2,927
Eval: 929 (cached)
Grading: 1,998 (1,636 prompt, 362 completion)
Providers:
openai:gpt-5: 496 (0 requests; cached)
openai:gpt-5-mini: 433 (0 requests; cached)
Results: ✓ 4 passed, ✗ 4 failed, 0 errors (50%)
Duration: 9s (concurrency: 4)
After (with --no-share flag)
✓ Eval complete (ID: eval-xyz-123)
» View results: promptfoo view
» Feedback: https://promptfoo.dev/feedback
Total Tokens: 2,927
Eval: 929 (cached)
Grading: 1,998 (1,636 prompt, 362 completion)
Providers:
openai:gpt-5: 496 (0 requests; cached)
openai:gpt-5-mini: 433 (0 requests; cached)
Results: ✓ 4 passed, ✗ 4 failed, 0 errors (50%)
Duration: 9s (concurrency: 4)
(Notice: share guidance is hidden when user explicitly disables sharing)
Key Changes
Layout Improvements
- Combined completion + ID:
✓ Eval complete (ID: eval-xyz-123)on single line - Visual spacing: Added blank line between provider breakdown and results for better separation
- Compact guidance: Simplified and action-oriented recommendations
UX Improvements
- Milder green: Changed from
greenBright.bold()togreen.bold()for better readability - Stronger view recommendation: "View results: promptfoo view" instead of "Run promptfoo view to use the local web viewer"
- Conditional share guidance: Respects
--no-shareflag and doesn't show share line when user explicitly disabled sharing - Simplified feedback: "Feedback: https://promptfoo.dev/feedback" instead of long marketing message
Token Display Improvements
- Smart caching display: Shows
929 (cached)instead of929 (929 cached)when 100% cached - Always show request counts: Displays "0 requests" to explicitly indicate 100% cache hit
- Only show eval breakdown when relevant: Hides "Eval:" line when there are no eval tokens
Bug Fixes
- CRITICAL: Fixed bug where grading tokens weren't displayed when eval had no provider tokens (e.g., when using
--model-outputswith llm-rubric assertions) - Only show "Eval:" breakdown when there are actual eval tokens to display
Other Improvements
- Smart pass rate precision:
100%for exact values,95.67%for decimals - Consistent ✓/✗ symbols for accessibility
- Better visual hierarchy throughout
Testing
- ✅ All 7,538 tests pass across 421 test suites
- ✅ Tested with various flags: --no-share, --no-write, --no-cache
- ✅ Tested edge cases: zero tokens, grading-only tokens, perfect/zero pass rates
- ✅ Tested with single and multiple providers
Breaking Changes
None. All existing functionality is preserved with improved presentation.
⏩ No test execution environment matched (25bb939dc2c360bd0816c434c854fc0be074af54) View output ↗
View check history
| Commit | Status | Output | Created (UTC) |
|---|---|---|---|
| 134605677eea9ed9ba594281a7e8aa7e14f0ba45 | ⏩ No test execution environment matched | Output | Oct 18, 2025 6:55AM |
| 4cace81f24b1991613b0e5baa96d80f0ffd29ac1 | ⏩ No test execution environment matched | Output | Oct 18, 2025 6:57AM |
| 84c0352e45f456735ac0f59abf7af263a43bef8a | ⏩ No test execution environment matched | Output | Oct 18, 2025 2:58PM |
| 4f4f431b9e5c0a2ca4d1591ad94c342d859a454b | ⏩ No test execution environment matched | Output | Oct 18, 2025 3:31PM |
| 75f211ee7d09daac8317ec1f0e55febdd9034a21 | ⏩ No test execution environment matched | Output | Oct 18, 2025 4:16PM |
| 188f79078cb43f52f2fcf6ca465b68a511d2a570 | ⏩ No test execution environment matched | Output | Oct 18, 2025 6:07PM |
| 9bff86abf3d69ce80039c758f9d9a0c6275269e3 | ⏩ No test execution environment matched | Output | Oct 25, 2025 7:16AM |
| 75e90da50f48d346e509cf16ae92ea8323d7dd89 | ⏩ No test execution environment matched | Output | Oct 31, 2025 4:21AM |
| efa4b9e0bf57b361a544c45b24d791d1342644c0 | ⏩ No test execution environment matched | Output | Nov 18, 2025 2:21PM |
| 7567ace810811743d07f6446e861d5833abc2f24 | ⏩ No test execution environment matched | Output | Nov 19, 2025 4:56AM |
| cc0638a5f2ddc4d26de9e20514a900110d24f0ac | ⏩ No test execution environment matched | Output | Nov 19, 2025 6:51AM |
| 1b410630f2ab467f48740fc2398dcb0c6905f206 | ⏩ No test execution environment matched | Output | Nov 20, 2025 4:01PM |
| b115700e6f634b0cc88d7c0fb24a90e5c34da5e4 | ⏩ No test execution environment matched | Output | Nov 20, 2025 6:52PM |
| 804a13db2bb981b48958cc9d250163dbe287fb18 | ⏩ No test execution environment matched | Output | Nov 25, 2025 5:32PM |
| 103f4d918a9c1f039e4624974ce7caf133c225db | ⏩ No test execution environment matched | Output | Nov 25, 2025 5:59PM |
| 41d252d8a3eb524414e844be7edbe97fee9f82f0 | ⏩ No test execution environment matched | Output | Dec 6, 2025 6:17PM |
| f08f66d228a6976fd740ee58797d8a89ed2080ac | ⏩ No test execution environment matched | Output | Dec 6, 2025 6:44PM |
| 4abc54c4697a0bca96872f77cb52338195192b07 | ⏩ No test execution environment matched | Output | Dec 6, 2025 6:59PM |
| 4be6ec8e22e0ca5e1499e4e02fdc31df2a343e58 | ⏩ No test execution environment matched | Output | Dec 6, 2025 7:01PM |
| 6478c70af9530522b5661ce672fa41ab906c0e7a | ⏩ No test execution environment matched | Output | Dec 6, 2025 7:23PM |
| f04618a0d2be87e6524a3adea7f5ea7182f6146e | ⏩ No test execution environment matched | Output | Dec 6, 2025 7:27PM |
| b6eb8833d94afe106cc579bedf58b9727b0cb913 | ⏩ No test execution environment matched | Output | Dec 6, 2025 7:29PM |
| 3bc4cb7b81ca944463552b2236a33cc529c92d52 | ⏩ No test execution environment matched | Output | Dec 6, 2025 7:33PM |
| 9e7d73c1febc9824619a4e2cb3ad865baa7daed2 | ⏩ No test execution environment matched | Output | Dec 6, 2025 7:59PM |
| 2760bf6fc11b0f0741ccc7335969db41ba260ebc | ⏩ No test execution environment matched | Output | Dec 6, 2025 9:50PM |
| 4771b388d669a2c33367937daf9aa6d9400a6193 | ⏩ No test execution environment matched | Output | Dec 6, 2025 9:57PM |
| 6d1e967937be06b35c7d332245ba42183106fd5e | ⏩ No test execution environment matched | Output | Dec 6, 2025 10:04PM |
| 06cfdbe06b9639d3e13f8a9b11d6de85ab232509 | ⏩ No test execution environment matched | Output | Dec 6, 2025 10:07PM |
| 21c846acb6fdf18085cef6756a7406c3e1c440e6 | ⏩ No test execution environment matched | Output | Dec 12, 2025 4:28PM |
| b5573e3b0d6ea634a82b4e328577a7caab6fd354 | ⏩ No test execution environment matched | Output | Dec 13, 2025 3:33PM |
| 9abcb1ade911bb75462a87abf295f9d7e52a3ac0 | ⏩ No test execution environment matched | Output | Dec 13, 2025 3:34PM |
| fb5b4ff785b0fe1d62d219ccd03e5f66aceea2b5 | ⏩ No test execution environment matched | Output | Dec 20, 2025 4:27AM |
| 25bb939dc2c360bd0816c434c854fc0be074af54 | ⏩ No test execution environment matched | Output | Dec 20, 2025 4:31AM |
📝 Walkthrough
Walkthrough
The change reworks the finalization and reporting flow in doEval by consolidating completion messaging into a single line, replacing separate branches with conditional logic to compute one message for various scenarios. It introduces combined token usage summaries for eval and grading tokens, adds provider breakdown presentation for multiple providers, implements new pass-rate formatting with coloring, and reorganizes the display sequence with improved spacing and section headers. Existing discrete log lines are replaced with concise, bolded summaries and standardized guidance text. Conditions around sharing guidance are adjusted based on explicit user preferences.
Estimated code review effort
🎯 3 (Moderate) | ⏱️ ~20 minutes
Rationale: This change is confined to a single file but involves multiple interconnected logic modifications across different sections: completion messaging consolidation, token usage calculations, provider breakdown logic, pass-rate formatting, and display sequencing. While the changes follow consistent patterns (refactoring output generation), they span several functional areas requiring separate reasoning to verify correctness of each logic block and their interactions. The mix of conditional logic refinements and formatting adjustments adds moderate complexity without reaching dense logic density across many files.
Pre-merge checks and finishing touches
❌ Failed checks (1 warning)
| Check name | Status | Explanation | Resolution |
|---|---|---|---|
| Docstring Coverage | ⚠️ Warning | Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. | You can run @coderabbitai generate docstrings to improve docstring coverage. |
✅ Passed checks (2 passed)
| Check name | Status | Explanation |
|---|---|---|
| Title Check | ✅ Passed | The pull request title "feat: improve eval summary output with compact layout and better UX" is fully aligned with the main changes in the changeset. The raw summary and PR description confirm the core change is redesigning the eval summary output to be more compact and readable while preserving functionality. The title is concise (67 characters), uses clear and specific language that accurately describes the primary focus, and follows conventional commit format. A developer scanning the project history would immediately understand this is about improving the output formatting and user experience of eval summaries. |
| Description check | ✅ Passed | The pull request description comprehensively describes the changeset, detailing the redesigned eval summary output, async background sharing feature, layout improvements, UX enhancements, token display improvements, and bug fixes. |
✨ Finishing touches
- [ ] 📝 Generate docstrings
🧪 Generate unit tests (beta)
- [ ] Create PR with unit tests
- [ ] Post copyable unit tests in a comment
- [ ] Commit unit tests in branch
feature/compact-eval-summary
Comment @coderabbitai help to get the list of available commands and usage tips.
Does this require another review, @mldangelo?