deep-rl-class
deep-rl-class copied to clipboard
This repo contains the syllabus of the Hugging Face Deep Reinforcement Learning Course.
I've added comments to why I think each of these changes are useful. Thanks!
The provided implementation computes the policy loss considering only the return G_0, as: sum over all t of the G_0 *log_policy(a_t|s_t) However, the reinforce algorithm requires to compute the returns...
I have a suggestion to update an example in [the blogpost for Unit 1](https://huggingface.co/blog/deep-rl-intro#rewards-and-the-discounting). I'm probably being somewhat nitpicky and probably in general the example already work well to get...
Hi Thomas, I just wanted share with you the script I use to experiment with optuna on unit-3 for space invaders.
I've included steps to log the runs as per #16 Do let me know if any changes are needed before merging it :)
Creating a glossary with key words would be very beneficial. Contributions for v0 are welcomed :fire:
Hey there! I would like to make a notebook which helps others get started with logging of their experiments with tensorboard and wandb, along with pushing the logs to hub.
Hey guys, I think there are two typos in **step 4** update rule. Atm, it is written as: $Q(S_{t}, A_{t})\leftarrow Q(S_{t}, A_{t}) + \alpha [R_{t+1}+\gamma \max_{\alpha}Q(S_{t+1}, \alpha) - Q(S_{t}, A_{t})]$...
# What do you want to improve? - Explain the typo/error or the part of the course you want to improve - **Also, don't hesitate to open a Pull Request...