jtwin
jtwin
In the paper and open spiel implementation, the neurd clip value is set to 10k. https://github.com/google-deepmind/open_spiel/blob/931e39a99ee73412500def0227925f8f19f033fe/open_spiel/python/algorithms/rnad/rnad.py#L604 From my testing, this is a source of major instability since the vtrace operator...
In the example for RNaD, the importance sampling correction for get_loss_nerd is 1. This is because the example provided is the on-policy case, and there are synchronous updates of the...
Based on formulae from the paper, the reward transformation is given by adding the log policy ratio  However, the code contains an entropy term instead. https://github.com/deepmind/open_spiel/blob/db0f4a78b1fd0bee0263d46d62fb4d693897329e/open_spiel/python/algorithms/rnad/rnad.py#L422 Which one is...
I am running a local showdown server for use in a deep reinforcement learning algorithm. I am noticing that the showdown server itself is the bottleneck for learning in terms...