agent-lightning Fix training metrics before and after processing

We introduce a suffix to distinguish between metrics computed before and after AgentLightning’s post-processing.

"Before" refers to raw reward and advantage values.

"After" refers to values computed following post-processing, which involves:

Dropping prompts that exceed the maximum allowed length.
Adjusting the batch size to be a multiple of the mini PPO size.

Different suffixes are used to label these two stages accordingly.

The suffix _before_processing indicates the raw rewards, returns, and prompt lengths gathered directly from agent traces.
In contrast, the suffix _after_processing refers to the traces that have been filtered and adjusted for training.

Oct 13 '25 04:10 hzy46

Please merge from main as there are CI updates.

Oct 20 '25 03:10 ultmaster

/ci

Oct 31 '25 10:10 ultmaster

🚀 CI Watcher for correlation id-3472526939-mheqrxlo triggered by comment 3472526939 🏃‍♀️ Tracking 2 workflow run(s):

🔴 PR #145 - Label ci-spider - id-3472526939-mheqrxlo — completed/failure
🟢 PR #145 - Label ci-calc-x - id-3472526939-mheqrxlo — completed/success

✅ All runs completed.

Oct 31 '25 11:10 github-actions[bot]