ᴸᵘᶜʳᵉᶜᵉ ˢʰⁱⁿ

Results 3 comments of ᴸᵘᶜʳᵉᶜᵉ ˢʰⁱⁿ

How about just : `index = randint(4, vocab_size - 1)`

I agree. **Expected return G_t (sum of FUTURE awards)** should be multiplied with each log p(At|St), which decreases as t increases, not the cumulative reward/episode R which is same for...