rlai-exercises Answer of exercise 2.4 is wrong

Answer of exercise 2.4 is wrong

Open AbhishekVarghese opened this issue 4 years ago • 1 comments

Hi, Hector

I am referring to the second edition of the book.

Exercise 2.4 If the step-size parameters, αn, are not constant, then the estimate Q n is a weighted average of previously received rewards with a weighting different from that given by (2.6). What is the weighting on each prior reward for the general case, analogous to (2.6), in terms of the sequence of step-size parameters?

If you try to check the coefficient of Rn from your answer, it comes out to be αn*Π[1-αi](i=1 to n) whereas the actual answer is αn.

Hence the correct formulation should be the following : Q(n+1) = Πn(i=1)[1-αi]Q1 + Σn(i=1)[αiΠn(j=i)[1-αj]*Ri]

i.e. iterate from j = i to n instead of j = 1 to i.

Please correct me if I am wrong. Thank you.

Jun 25 '20 06:06 AbhishekVarghese

Q(n+1) = Πn(i=1)[1-αi]Q1 + Σn(i=1)[αiΠn(j=i+1)[1-αj]*Ri]

Jul 28 '20 13:07 niuwagege

rlai-exercises rlai-exercises copied to clipboard

Answer of exercise 2.4 is wrong

rlai-exercises
rlai-exercises copied to clipboard