reinforcement-learning-an-introduction
reinforcement-learning-an-introduction copied to clipboard
ten_armed_testbed.py中的figure2_3为何不用“sample_averages”
按照书上的介绍,用固定的步长是因为非平稳,当时代码中摇臂设置是平稳的,为何不用“sample_averages”来估计各个摇臂的value?二者方式差异较大,上边的图是固定步长,可见“sample_averages”法收敛的更快,但为何二者收敛的还不同