Sampreet
Sampreet
If someone wants to try this out, feel free to do so.
One issue is that the shapes are inferred directly using `foo.reshape(-1, some_size)` instead of `foo.reshape(some_known_size, some_size)`. Might take some playing around to figure out the problem.
The `categorical_q_target` function seems to be the main problem here. Tried to reshape and modify it a lot but didn't work. I'm guessing the problem is because of both `n_envs`...
We should try trianing these after adding CUDA support. Most of these were tested a long time ago so we should test them again.
Things to be included: - Example code to run the algo - Links to the source docs of the relevant algos - Hyperparameters/arguments you can customise. Like for example show...
Is the mean reward dropping? Also run `trainer.evaluate` for a few episodes to check if the final mean reward is 200.0 or not. Our logger rounds of the loss values....
Does it go to around 160-180 in the middle? It's a known issue that our A2C is unstable and suddenly drops in performance midway. If it's not training at all,...
Yeah, so it's fine for now. Not sure why our A2C collapses all of a sudden. There isn't any problem with the logic.
For anyone working on this, please add the files to `docs/source/tutorials` not `docs/source`
Timestep from the very beginning