OMIGA The confusing transformation about rewards to rtgs.

The confusing transformation about rewards to rtgs.

Open RZ-Q opened this issue 3 months ago • 1 comments

In the get_episode() function, the rewards have been turned into reward-to-gos, which is not describe in the paper.

for agent_trajectory in episode:
    rtgs = 0
    for i in reversed(range(len(agent_trajectory))):
        rtgs += agent_trajectory[i][3][0]
        agent_trajectory[i][3][0] = rtgs

return episode

Can you explain why should turn rewards to reward-to-go? Is this transformation applied for other baselines too? Sorry to bother you.

Mar 21 '24 15:03 RZ-Q

OMIGA OMIGA copied to clipboard

The confusing transformation about rewards to rtgs.

OMIGA
OMIGA copied to clipboard