ᴸᵘᶜʳᵉᶜᵉ ˢʰⁱⁿ
Results
3
comments of
ᴸᵘᶜʳᵉᶜᵉ ˢʰⁱⁿ
How about just : `index = randint(4, vocab_size - 1)`
I agree. **Expected return G_t (sum of FUTURE awards)** should be multiplied with each log p(At|St), which decreases as t increases, not the cumulative reward/episode R which is same for...
such as?