Nikhil Barhate
Results
2
comments of
Nikhil Barhate
First, this repository does NOT use Generalized Advantage Estimation; it uses `monte-carlo estimate` for calculating `rewards_to_go` (`reward` variable in code) and `advantages` = `rewards_to_go` - `V(s_t)`. The only time we...
The Decision Transformer paper does not provide results for pointmaze environment, it is a difficult env and I would not expect DT to work well on it out of the...