Nikhil Barhate

Results 1 comments of Nikhil Barhate

First, this repository does NOT use Generalized Advantage Estimation; it uses `monte-carlo estimate` for calculating `rewards_to_go` (`reward` variable in code) and `advantages` = `rewards_to_go` - `V(s_t)`. The only time we...