Reinforcement-Learning-2nd-Edition-by-Sutton-Exercise-Solutions icon indicating copy to clipboard operation
Reinforcement-Learning-2nd-Edition-by-Sutton-Exercise-Solutions copied to clipboard

question about exercise 5.13

Open 315930399 opened this issue 4 years ago • 0 comments

I really can't understand the proof of Per-decision Importance Sampling in section 5.9 In my opinion, roi(t:t+k-1)*R(t+k) depends on S(t), A(t),...., S(t+k-1), A(t+k-1) and roi(t+k:T-1) depends on S(t+k), A(t+k),...., S(T-1), A(T-1) Since S(t), A(t),...., S(t+k-1), A(t+k-1) and S(t+k), A(t+k),...., S(T-1), A(T-1) are not independent, roi(t:t+k-1)*R(t+k) and roi(t+k:T-1) should also be not independent Hoping for your reply, thanks.

315930399 avatar Dec 01 '20 07:12 315930399