minimalRL
minimalRL copied to clipboard
A naive question about updating parameters in DDPG.
Hi, first of all, thanks for your awesome codes. This is not about any technical issue, but about the algorithm of the DDPG code.
As far as I know, the DDPG method can exploit online parameter update due to the TD learning. But, in your code, the parameters are updated after an episode is over.
I would like to ask you if there are some theoretical background behind this parameter update interval?
Thank you in advance.