Reinforcement-Learning-2nd-Edition-by-Sutton-Exercise-Solutions
Reinforcement-Learning-2nd-Edition-by-Sutton-Exercise-Solutions copied to clipboard
question about exercise 5.13
I really can't understand the proof of Per-decision Importance Sampling in section 5.9 In my opinion, roi(t:t+k-1)*R(t+k) depends on S(t), A(t),...., S(t+k-1), A(t+k-1) and roi(t+k:T-1) depends on S(t+k), A(t+k),...., S(T-1), A(T-1) Since S(t), A(t),...., S(t+k-1), A(t+k-1) and S(t+k), A(t+k),...., S(T-1), A(T-1) are not independent, roi(t:t+k-1)*R(t+k) and roi(t+k:T-1) should also be not independent Hoping for your reply, thanks.