dreamer-pytorch
dreamer-pytorch copied to clipboard
Reward loss timescale
Hi,
I believe the reward loss should be based on rewards[1:]
instead of rewards[:-1]
:
https://github.com/yusukeurakami/dreamer-pytorch/blob/7e9050e8c454309de40bd0d1a4ec0256ef600147/main.py#L209
If not, can you please explain your reasoning? Thanks,