OMIGA
OMIGA copied to clipboard
The confusing transformation about rewards to rtgs.
In the get_episode() function, the rewards have been turned into reward-to-gos, which is not describe in the paper.
for agent_trajectory in episode:
rtgs = 0
for i in reversed(range(len(agent_trajectory))):
rtgs += agent_trajectory[i][3][0]
agent_trajectory[i][3][0] = rtgs
return episode
Can you explain why should turn rewards to reward-to-go? Is this transformation applied for other baselines too? Sorry to bother you.