deep-rl-class
deep-rl-class copied to clipboard
Q-Learning pseudocode | Mathematical notation
Hi, My remark is about the mathematical notation of Q-Learning pseudocode in unit2.ipynb. I found the following notation a little bit confusing: Q(s,a) + lr [R(s,a) + gamma * max Q(s',a') - Q(s,a)] Maximization should be taken over all possible values for the action variable (second variable) of the two-variable function Q, while the above expression, i.e., max Q(s',a'), maximizes the Q at the specified points of s' and a' as its first and second variable. It can become clearer if the general variables and specified points are represented with small and capital letters, respectively, e.g., Q(s, a) function at the specified points s=S and a=A can be represented as Q(S, A). So:
- Current version: max Q(s',a') implies maximization of the two-variable function Q at the specifief points of s' and a' (since s' has been defined to be a specified point).
- Suggested version: max_a Q(S',a) implies maximization of the Q function at the specific point of S' (as its first variable) and over its second variable, i.e., a.