open_spiel
open_spiel copied to clipboard
On the acceleration of convergence process in neural replicator dynamics
Hi,guys, i was wondering whether it is possible to accelerate the convergence process in Neural Replicator Dynamics. In this paper,https://arxiv.org/abs/1906.00190, the accumulated values yt at time t is approximated by Euler discretization. From my knowledge, the second order approximation is available and is it possible to use to accelerate the convergence process?
The analysis is as follows:
The Adams-Bashforth formula is as follows (second order approximation):
yk+1 = yk + h* ( 3/2 * f(tk, yk) - 1/2 * f(tk-1, yk-1 )) dy/dt = f(t,y)
From the neural replicator dynamics paper, the NeuRD's policy update formula comes from minimizing the Logit error, and the Logit estimates are given by the Euler discretization. We know that the Euler discretization is a first-order method, and the Adams-Bashforth method is a second-order method. The Adams-Bashforth method satisfies the root condition and its local truncation error is second order, converges according to the Dahlquist equivalence theorem, and the global error is also second order with higher accuracy than the Euler discretization method. So We can choose a larger learning step size. And the update applied to NeuRD means that the critic networks, not only at time t, but also at the previous time t-1, give different advantage function estimates. Does this mean that policy will converge faster?
Any suggestion is appreciated.
That is a neat idea, you should try it!
Tagging some potentially interested parties: @dmorrill10 @dhennes @shayegano @perolat