multi-armed-bandit
multi-armed-bandit copied to clipboard
UCB1's estimates update
in solvers.py, line 97 : self.estimates[i] += 1. / (self.counts[i] + 1) * (r - self.estimates[i])
i think it should like that: self.estimates[i] = payoff[i] / (self.counts[i] + 1)
Could you please explain it? Thanks!
Hello Haotian,
I think they are equivalent. Since line97 is adding the difference between estimated rewards at time t and estimated rewards at time t-1 which is equivalent as your statement. Thanks.
为什么我运行了,没有图形结果