rlai-exercises
rlai-exercises copied to clipboard
Answer of exercise 2.4 is wrong
Hi, Hector
I am referring to the second edition of the book.
Exercise 2.4 If the step-size parameters, αn, are not constant, then the estimate Q n is a weighted average of previously received rewards with a weighting different from that given by (2.6). What is the weighting on each prior reward for the general case, analogous to (2.6), in terms of the sequence of step-size parameters?
If you try to check the coefficient of Rn from your answer, it comes out to be αn*Π[1-αi](i=1 to n) whereas the actual answer is αn.
Hence the correct formulation should be the following : Q(n+1) = Πn(i=1)[1-αi]Q1 + Σn(i=1)[αiΠn(j=i)[1-αj]*Ri]
i.e. iterate from j = i to n instead of j = 1 to i.
Please correct me if I am wrong. Thank you.
Q(n+1) = Πn(i=1)[1-αi]Q1 + Σn(i=1)[αiΠn(j=i+1)[1-αj]*Ri]