agent-lightning icon indicating copy to clipboard operation
agent-lightning copied to clipboard

[About Credit Assignment] How to implement an action-dependent reward function?

Open Kwen-Chen opened this issue 5 months ago • 1 comments

The credit assignment in the sample code applies a uniform policy to all states. However, I'd like to assign rewards differently based on the specific action taken. What would be a general approach to implement this?

Kwen-Chen avatar Aug 08 '25 10:08 Kwen-Chen

  1. Use the @reward decorator to create spans of intermediate rewards for different actions. See examples/calc_x and tests/test_trace.py for example.
  2. Copy agentlightning/verl out (yes you hear me right), and customize the logic here to whatever you want.

https://github.com/microsoft/agent-lightning/blob/4d98e85e46790da57e2a42833342b9597582da5a/agentlightning/verl/daemon.py#L418

We are planning to support a config to customize this, but the credit assignment can be a complex logic in most cases, while verl only supports a decorative config. I suspect the final design might only relief users from the "copy" step.

ultmaster avatar Aug 09 '25 00:08 ultmaster