deep-rl-class icon indicating copy to clipboard operation
deep-rl-class copied to clipboard

This repo contains the syllabus of the Hugging Face Deep Reinforcement Learning Course.

Results 132 deep-rl-class issues
Sort by recently updated
recently updated
newest added

I've added comments to why I think each of these changes are useful. Thanks!

The provided implementation computes the policy loss considering only the return G_0, as: sum over all t of the G_0 *log_policy(a_t|s_t) However, the reinforce algorithm requires to compute the returns...

I have a suggestion to update an example in [the blogpost for Unit 1](https://huggingface.co/blog/deep-rl-intro#rewards-and-the-discounting). I'm probably being somewhat nitpicky and probably in general the example already work well to get...

Hi Thomas, I just wanted share with you the script I use to experiment with optuna on unit-3 for space invaders.

I've included steps to log the runs as per #16 Do let me know if any changes are needed before merging it :)

Creating a glossary with key words would be very beneficial. Contributions for v0 are welcomed :fire:

Hey there! I would like to make a notebook which helps others get started with logging of their experiments with tensorboard and wandb, along with pushing the logs to hub.

Hey guys, I think there are two typos in **step 4** update rule. Atm, it is written as: $Q(S_{t}, A_{t})\leftarrow Q(S_{t}, A_{t}) + \alpha [R_{t+1}+\gamma \max_{\alpha}Q(S_{t+1}, \alpha) - Q(S_{t}, A_{t})]$...

# What do you want to improve? - Explain the typo/error or the part of the course you want to improve - **Also, don't hesitate to open a Pull Request...