deep-rl-class icon indicating copy to clipboard operation
deep-rl-class copied to clipboard

Unit II - Part II Update Rule for Q-values

Open EvanMath opened this issue 3 years ago • 2 comments

Hey guys,

I think there are two typos in step 4 update rule. Atm, it is written as:

$Q(S_{t}, A_{t})\leftarrow Q(S_{t}, A_{t}) + \alpha [R_{t+1}+\gamma \max_{\alpha}Q(S_{t+1}, \alpha) - Q(S_{t}, A_{t})]$

instead, I think it should be: $Q(S_{t}, A_{t})\leftarrow Q(S_{t}, A_{t}) + \alpha [R_{t}+\gamma \max_{\alpha}Q(S_{t+1}, A_{t+1}) - Q(S_{t}, A_{t})]$

where $A_{t+1}$ is the best action of the next state and $R_{t}$ refers to the immediate reward at step $t$.

Regards, Vangelis

EvanMath avatar Aug 16 '22 14:08 EvanMath

Hey there 👋 ,

So indeed there's a mistake here but for me it's that it should be max a' Q(St+1, a') a' indicates the best At+1

It's not supposed to be alpha but a.

https://i.stack.imgur.com/OMzXf.png

simoninithomas avatar Aug 22 '22 06:08 simoninithomas

Agree with that as $\alpha$ refers to the "learning rate". As for the immediate reward IMHO it should be $R_{t}$ unless it relates to when we suppose the timestep changes.

EvanMath avatar Aug 22 '22 08:08 EvanMath

Hey there 👋 , we updated the course I'm closing the issue thanks again 🤗

simoninithomas avatar Dec 20 '22 12:12 simoninithomas