Probabilistic-Programming-and-Bayesian-Methods-for-Hackers
Probabilistic-Programming-and-Bayesian-Methods-for-Hackers copied to clipboard
Chapter 5 bandits strategies
I think that there are 2 problems with the strategies adopted in chapter 5:
- The text suggests that strategies 4 and 5 are different and will be applied. The strategy max_mean refers to which of them? I think that the other strategies applied (upper_credible_choice, bayesian_bandit_choice, ucb_bayes, random_choice) are not 4 neither 5.
- The strategy "max_mean" always chooses the bandit 0. All bandits are initialized with 0 wins and argmax in this case will return 0 so you will choose bandit 0. You probably will win with bandit 0 with time and with a win rate higher than 0 you will always choose the bandit 0.