agent-lightning
agent-lightning copied to clipboard
[About Credit Assignment] How to implement an action-dependent reward function?
The credit assignment in the sample code applies a uniform policy to all states. However, I'd like to assign rewards differently based on the specific action taken. What would be a general approach to implement this?
- Use the
@rewarddecorator to create spans of intermediate rewards for different actions. Seeexamples/calc_xandtests/test_trace.pyfor example. - Copy
agentlightning/verlout (yes you hear me right), and customize the logic here to whatever you want.
https://github.com/microsoft/agent-lightning/blob/4d98e85e46790da57e2a42833342b9597582da5a/agentlightning/verl/daemon.py#L418
We are planning to support a config to customize this, but the credit assignment can be a complex logic in most cases, while verl only supports a decorative config. I suspect the final design might only relief users from the "copy" step.