agent-lightning icon indicating copy to clipboard operation
agent-lightning copied to clipboard

Fix training metrics before and after processing

Open hzy46 opened this issue 2 months ago • 1 comments

We introduce a suffix to distinguish between metrics computed before and after AgentLightning’s post-processing.

"Before" refers to raw reward and advantage values.

"After" refers to values computed following post-processing, which involves:

  • Dropping prompts that exceed the maximum allowed length.
  • Adjusting the batch size to be a multiple of the mini PPO size.

Different suffixes are used to label these two stages accordingly.

image

The suffix _before_processing indicates the raw rewards, returns, and prompt lengths gathered directly from agent traces.
In contrast, the suffix _after_processing refers to the traces that have been filtered and adjusted for training.

hzy46 avatar Oct 13 '25 04:10 hzy46

Please merge from main as there are CI updates.

ultmaster avatar Oct 20 '25 03:10 ultmaster

/ci

ultmaster avatar Oct 31 '25 10:10 ultmaster

🚀 CI Watcher for correlation id-3472526939-mheqrxlo triggered by comment 3472526939 🏃‍♀️ Tracking 2 workflow run(s):

✅ All runs completed.

github-actions[bot] avatar Oct 31 '25 11:10 github-actions[bot]