Probabilistic-Programming-and-Bayesian-Methods-for-Hackers Chapter 5 bandits strategies

Chapter 5 bandits strategies

Open gustavosantos opened this issue 7 years ago • 0 comments

I think that there are 2 problems with the strategies adopted in chapter 5:

The text suggests that strategies 4 and 5 are different and will be applied. The strategy max_mean refers to which of them? I think that the other strategies applied (upper_credible_choice, bayesian_bandit_choice, ucb_bayes, random_choice) are not 4 neither 5.
The strategy "max_mean" always chooses the bandit 0. All bandits are initialized with 0 wins and argmax in this case will return 0 so you will choose bandit 0. You probably will win with bandit 0 with time and with a win rate higher than 0 you will always choose the bandit 0.

Aug 14 '18 23:08 gustavosantos