python-machine-learning-book-3rd-edition icon indicating copy to clipboard operation
python-machine-learning-book-3rd-edition copied to clipboard

book errata p 682

Open elfelround opened this issue 4 years ago • 3 comments

758A2709-3501-4DF5-BC49-7C11EA40C4BB book errata p 682

elfelround avatar May 19 '20 13:05 elfelround

Thanks for pointing out this errata. You are right, I think I missed counting the final step. At t=7 an action is taken in order to go to the terminal state T.

So for episode 1, t goes from 0 to 8, and recall that the sequence is denoted by

  • t=0: <S0, A0, R1>
  • t=1: <S1, A1, R2>
  • t=2: <S2, A2, R3>
  • ..
  • t=7: <S7, A7, R8>
  • Terminal state T

Also , we know the following immediate rewards: R1=0, R2=0, ..., R6=0, R7=0, R8=1

So now let's calculate the returns for the first episode:

  • At t=0 : S_0=B => G_0 = R1 + gamma * R2 + ... + gamma^6 * R7 + gamma^7 * R8 = gamma^7 * R8
  • At t=1 : S_1=B => G_1 = R2 + gamma * R3 + ... + gamma^5 * R7 + gamma^6 * R8 = gamma^6 * R8
  • At t=2 : S_2 =C => G_2 = R3 + gamma * R4 + ... + gamma^4 * R7 + gamma^5 * R8 = gamma^5 * R8
  • At t=3 : S_3 =C => G_3 = R4 + gamma * R5 + ... + gamma^3 * R7 + gamma^4 * R8 = gamma^4 * R8
  • At t=4 : S_4 =C => G_4 = R5 + gamma * R6 + gamma^2 * R7 + gamma^3 * R8 = gamma^3 * R8
  • At t=5 : S_5 =C => G_5 = R6 + gamma * R7 + gamma^2*R8= gamma^2 * R7
  • At t=6 : S_6 = B => G_6 = R7 + gamma * R8 = gamma
  • At t=7 : S_7 = A => G_7 = R8 = 1
  • Terminal state T

Similarly, for episode 2, t goes from 0 to 10, and R0=R1=...=R9=0 while R10=-1

  • At t=0 : S_0=A => G_0 = R1 + gamma * R2 + ... + gamma^9 * R10 = gamma^9 * R10
  • At t=1 : S_1=B => G_1 = R2 + gamma * R3 + ... + gamma^8 * R10 = gamma^8 * R10
  • At t=2 : S_2 =B => G_2 = R3 + gamma * R4 + ... + gamma^7 * R10 = gamma^7 * R10
  • At t=3 : S_3 =B => G_3 = R4 + gamma * R5 + ... + gamma^6 * R10 = gamma^6 * R10
  • At t=4 : S_4 =B => G_4 = R5 + gamma * R6 + ... + gamma^5 * R10 = gamma^5 * R10
  • At t=5 : S_5 =B => G_5 = R6 + gamma * R7 + ... + gamma^4 * R10 = gamma^4 * R10
  • At t=6 : S_6 =B => G_6 = R7 + gamma * R8 + ... + gamma^3*R10 = gamma^3 * R10
  • At t=7 : S_7 =B => G_7 = R8 + gamma * R9 + gamma^2 * R10 = gamma^2 * R10
  • At t=8 : S_8 =B => G_8 = R9 + gamma * R10 = gamma * R10
  • At t=9 : S_9 = B => G_9 = R10 = -1
  • Terminal state T

vmirly avatar May 22 '20 23:05 vmirly

@vmirly posted this errata via packt but it was annoying as heck to explain without an image, they just said, can u further explain? and i thought fuck it. having a corrected answer will facilitate my understanding of this part, ill read it when i have time with my current book and let you know how this follows :) also the RL chapter is still great, but feels rushed in comparison with whole book, maybe a bit more love to it on 4th ed? xx

elfelround avatar May 24 '20 18:05 elfelround

also the RL chapter is still great, but feels rushed in comparison with whole book, maybe a bit more love to it on 4th ed? xx

Oh yeah for sure. The rewrites for Tf 2.0 took much longer than expected. And we both were very busy in Fall (due to my teaching responsibilities and Vahid starting a new position). It definitely could and should be smoothened out in a potential next edition. Thanks for your feedback!

rasbt avatar Jul 28 '20 17:07 rasbt