HandyRL icon indicating copy to clipboard operation
HandyRL copied to clipboard

(Idea) feature: proportional accept rate during all phases

Open YuriCat opened this issue 3 years ago • 0 comments

So far, the adoption rate in the replay buffer has been linear based on maximum_episodes, but this means that the earliest episodes will be selected many times before the buffer is filled.

Even if the diversity in each batch will be decreased a little, it would be better to use a weight proportional to the number of current episodes so that the earliest episodes are less likely to be selected.

YuriCat avatar Jul 26 '22 08:07 YuriCat