deep-rl-class issues

Clear up some unneeded and confusing parts of Unit 2

2

I've added comments to why I think each of these changes are useful. Thanks!

Fix unit5 reinforce implementation

2

The provided implementation computes the policy loss considering only the return G_0, as: sum over all t of the G_0 *log_policy(a_t|s_t) However, the reinforce algorithm requires to compute the returns...

Chris1nexus

Suggested update example Unit 1: discounting

1

I have a suggestion to update an example in [the blogpost for Unit 1](https://huggingface.co/blog/deep-rl-intro#rewards-and-the-discounting). I'm probably being somewhat nitpicky and probably in general the example already work well to get...

DennisSoemers

Feature/optuna unit3

2

Hi Thomas, I just wanted share with you the script I use to experiment with optuna on unit-3 for space invaders.

micheljperez

Logging with Tensorboard and Wandb

1

I've included steps to log the runs as per #16 Do let me know if any changes are needed before merging it :)

SuperSecureHuman

[Contributions welcomed] Create a glossary

Creating a glossary with key words would be very beneficial. Contributions for v0 are welcomed :fire:

osanseviero

Add content about certification

osanseviero

Logging with tensorboard and wandb

2

Hey there! I would like to make a notebook which helps others get started with logging of their experiments with tensorboard and wandb, along with pushing the logs to hub.

SuperSecureHuman

Unit II - Part II Update Rule for Q-values

2

Hey guys, I think there are two typos in **step 4** update rule. Atm, it is written as: $Q(S_{t}, A_{t})\leftarrow Q(S_{t}, A_{t}) + \alpha [R_{t+1}+\gamma \max_{\alpha}Q(S_{t+1}, \alpha) - Q(S_{t}, A_{t})]$...

EvanMath

[UPDATE] An error in quiz 2

1

# What do you want to improve? - Explain the typo/error or the part of the course you want to improve - **Also, don't hesitate to open a Pull Request...

S-N-O-R-L-A-X

deep-rl-class
deep-rl-class copied to clipboard

Metadata

Clear up some unneeded and confusing parts of Unit 2

Fix unit5 reinforce implementation

Suggested update example Unit 1: discounting

Feature/optuna unit3

Logging with Tensorboard and Wandb

[Contributions welcomed] Create a glossary

Add content about certification

Logging with tensorboard and wandb

Unit II - Part II Update Rule for Q-values

[UPDATE] An error in quiz 2

← Metadata

Owner

Metadata

deep-rl-class deep-rl-class copied to clipboard

Metadata

← Metadata

Owner

Metadata

deep-rl-class
deep-rl-class copied to clipboard