OMIGA icon indicating copy to clipboard operation
OMIGA copied to clipboard

The confusing transformation about rewards to rtgs.

Open RZ-Q opened this issue 3 months ago • 1 comments

In the get_episode() function, the rewards have been turned into reward-to-gos, which is not describe in the paper.

for agent_trajectory in episode:
    rtgs = 0
    for i in reversed(range(len(agent_trajectory))):
        rtgs += agent_trajectory[i][3][0]
        agent_trajectory[i][3][0] = rtgs

return episode

Can you explain why should turn rewards to reward-to-go? Is this transformation applied for other baselines too? Sorry to bother you.

RZ-Q avatar Mar 21 '24 15:03 RZ-Q