deep-rl-class Q-Learning pseudocode | Mathematical notation

Q-Learning pseudocode | Mathematical notation

Open fardinafdideh opened this issue 1 year ago • 0 comments

Hi, My remark is about the mathematical notation of Q-Learning pseudocode in unit2.ipynb. I found the following notation a little bit confusing: Q(s,a) + lr [R(s,a) + gamma * max Q(s',a') - Q(s,a)] Maximization should be taken over all possible values for the action variable (second variable) of the two-variable function Q, while the above expression, i.e., max Q(s',a'), maximizes the Q at the specified points of s' and a' as its first and second variable. It can become clearer if the general variables and specified points are represented with small and capital letters, respectively, e.g., Q(s, a) function at the specified points s=S and a=A can be represented as Q(S, A). So:

Current version: max Q(s',a') implies maximization of the two-variable function Q at the specifief points of s' and a' (since s' has been defined to be a specified point).
Suggested version: max_a Q(S',a) implies maximization of the Q function at the specific point of S' (as its first variable) and over its second variable, i.e., a.

Dec 09 '23 15:12 fardinafdideh

deep-rl-class deep-rl-class copied to clipboard

Q-Learning pseudocode | Mathematical notation

deep-rl-class
deep-rl-class copied to clipboard