Gianluca De Stefano

Results 2 comments of Gianluca De Stefano
trafficstars

You are completely right. If the episode didn't end, you use the critic network (or the critic head of your twinheaded actor critic network) to approximate V(final_state)

Your adjusted implementation is fine. I use the same semantic for my A2C rollouts where unfinished episodes are processed by calling critic(last_state). Else, the code just works with finished episodes.