reinforcement-learning-an-introduction icon indicating copy to clipboard operation
reinforcement-learning-an-introduction copied to clipboard

ten_armed_testbed.py中的figure2_3为何不用“sample_averages”

Open A-Pai opened this issue 3 years ago • 0 comments

按照书上的介绍,用固定的步长是因为非平稳,当时代码中摇臂设置是平稳的,为何不用“sample_averages”来估计各个摇臂的value?二者方式差异较大,上边的图是固定步长,可见“sample_averages”法收敛的更快,但为何二者收敛的还不同 figure2_3 figure2_3_1

A-Pai avatar Jul 08 '21 02:07 A-Pai