Reinforcement-Learning-2nd-Edition-by-Sutton-Exercise-Solutions Ex 4.1

Ex 4.1

Open StoyanVenDimitrov opened this issue 4 years ago • 1 comments

Hi,

how do you come to value of state 11 being -14?

Feb 11 '21 11:02 StoyanVenDimitrov

In example 4.1 This is an undiscounted episodic task, the reward is -1 on all transitions until the terminal state is reached

Since state 11 is not the terminal state so it's reward is -14

Mar 07 '21 00:03 Kin-Zhang