Sampreet comments

Results 36 comments of


                                            Sampreet

Categorical DQN not training

If someone wants to try this out, feel free to do so.

Categorical DQN not training

One issue is that the shapes are inferred directly using `foo.reshape(-1, some_size)` instead of `foo.reshape(some_known_size, some_size)`. Might take some playing around to figure out the problem.

Categorical DQN not training

The `categorical_q_target` function seems to be the main problem here. Tried to reshape and modify it a lot but didn't work. I'm guessing the problem is because of both `n_envs`...

PPO1, A2C, VPG, DQN not training for Atari envs

We should try trianing these after adding CUDA support. Most of these were tested a long time ago so we should test them again.

Usage explanatory docs

Things to be included: - Example code to run the algo - Links to the source docs of the relevant algos - Hyperparameters/arguments you can customise. Like for example show...

Usage explanatory docs

Is the mean reward dropping? Also run `trainer.evaluate` for a few episodes to check if the final mean reward is 200.0 or not. Our logger rounds of the loss values....

Usage explanatory docs

Does it go to around 160-180 in the middle? It's a known issue that our A2C is unstable and suddenly drops in performance midway. If it's not training at all,...

Usage explanatory docs

Yeah, so it's fine for now. Not sure why our A2C collapses all of a sudden. There isn't any problem with the logic.

Usage explanatory docs

For anyone working on this, please add the files to `docs/source/tutorials` not `docs/source`

Usage explanatory docs

Timestep from the very beginning