deep-rl-class
deep-rl-class copied to clipboard
Unit II - Part II Update Rule for Q-values
Hey guys,
I think there are two typos in step 4 update rule. Atm, it is written as:
$Q(S_{t}, A_{t})\leftarrow Q(S_{t}, A_{t}) + \alpha [R_{t+1}+\gamma \max_{\alpha}Q(S_{t+1}, \alpha) - Q(S_{t}, A_{t})]$
instead, I think it should be: $Q(S_{t}, A_{t})\leftarrow Q(S_{t}, A_{t}) + \alpha [R_{t}+\gamma \max_{\alpha}Q(S_{t+1}, A_{t+1}) - Q(S_{t}, A_{t})]$
where $A_{t+1}$ is the best action of the next state and $R_{t}$ refers to the immediate reward at step $t$.
Regards, Vangelis
Hey there 👋 ,
So indeed there's a mistake here but for me it's that it should be max a' Q(St+1, a') a' indicates the best At+1
It's not supposed to be alpha but a.
https://i.stack.imgur.com/OMzXf.png
Agree with that as $\alpha$ refers to the "learning rate". As for the immediate reward IMHO it should be $R_{t}$ unless it relates to when we suppose the timestep changes.
Hey there 👋 , we updated the course I'm closing the issue thanks again 🤗