multi-armed-bandit icon indicating copy to clipboard operation
multi-armed-bandit copied to clipboard

UCB1's estimates update

Open Haotian-CS opened this issue 5 years ago • 2 comments

in solvers.py, line 97 : self.estimates[i] += 1. / (self.counts[i] + 1) * (r - self.estimates[i])

i think it should like that: self.estimates[i] = payoff[i] / (self.counts[i] + 1)

Could you please explain it? Thanks!

Haotian-CS avatar Jan 11 '20 12:01 Haotian-CS

Hello Haotian,

I think they are equivalent. Since line97 is adding the difference between estimated rewards at time t and estimated rewards at time t-1 which is equivalent as your statement. Thanks.

Jayzhaowj avatar Jun 14 '21 08:06 Jayzhaowj

为什么我运行了,没有图形结果

zhengshuai202 avatar Oct 30 '21 08:10 zhengshuai202