HandyRL
HandyRL copied to clipboard
(Idea) feature: proportional accept rate during all phases
So far, the adoption rate in the replay buffer has been linear based on maximum_episodes, but this means that the earliest episodes will be selected many times before the buffer is filled.
Even if the diversity in each batch will be decreased a little, it would be better to use a weight proportional to the number of current episodes so that the earliest episodes are less likely to be selected.