miniyosshi comments

Repositories
Issues
Comments

Results 1 comments of


                                            miniyosshi

Why does the sequences of rewards start at t-1?

Hi, in relation to this problem, I found env doesn't get an action of very first iteration. In training loop, `prev_action = torch.zeros(1, trainer.action_size).to(trainer.device) # initialize` ... `next_obs, rew, done,...