Fix training metrics before and after processing
We introduce a suffix to distinguish between metrics computed before and after AgentLightning’s post-processing.
"Before" refers to raw reward and advantage values.
"After" refers to values computed following post-processing, which involves:
- Dropping prompts that exceed the maximum allowed length.
- Adjusting the batch size to be a multiple of the mini PPO size.
Different suffixes are used to label these two stages accordingly.
The suffix _before_processing indicates the raw rewards, returns, and prompt lengths gathered directly from agent traces.
In contrast, the suffix _after_processing refers to the traces that have been filtered and adjusted for training.
Please merge from main as there are CI updates.
/ci
🚀 CI Watcher for correlation id-3472526939-mheqrxlo triggered by comment 3472526939 🏃♀️ Tracking 2 workflow run(s):
- 🔴 PR #145 - Label ci-spider - id-3472526939-mheqrxlo —
completed/failure - 🟢 PR #145 - Label ci-calc-x - id-3472526939-mheqrxlo —
completed/success
✅ All runs completed.