successor_examples icon indicating copy to clipboard operation
successor_examples copied to clipboard

updating reward

Open devloper13 opened this issue 5 years ago • 0 comments

When you call the reward update function, you send experience[-1]. But shouldn't it be experience[-2]. We are currently looking for S,A,R,S' from experience[-2] even while updating the state dynamics. We use experience[-1] only to get A'.

devloper13 avatar Mar 31 '20 10:03 devloper13