reinforcement-learning-an-introduction
reinforcement-learning-an-introduction copied to clipboard

Published 20 hours ago •

Reame
Issues

ten_armed_testbed.py中的figure2_3为何不用“sample_averages”

Open A-Pai opened this issue 3 years ago • 0 comments

按照书上的介绍，用固定的步长是因为非平稳，当时代码中摇臂设置是平稳的，为何不用“sample_averages”来估计各个摇臂的value？二者方式差异较大，上边的图是固定步长，可见“sample_averages”法收敛的更快，但为何二者收敛的还不同 figure2_3 figure2_3_1

Jul 08 '21 02:07 A-Pai