open_spiel
open_spiel copied to clipboard
RNaD negative loss and barely any correlation of loss with NashConv
I am training RNaDSolver on Kuhn Poker with this config:
config = rnad.RNaDConfig(
game_name="kuhn_poker",
trajectory_max=10,
state_representation=rnad.StateRepresentation.INFO_SET,
policy_network_layers=(128,),
batch_size=256,
learning_rate=1e-6,
seed=42
)
2 things I cannot explain:
- There are negative loss values
- While loss goes down more or less it does not really correlate with the NashConv metric. In the beginning it does but then it seems to stagnate while NashConv goes down nicely and smoothely reaching 0.008.
The second one is a problem because I cannot calculate NashConv for more complicated games so not sure what should I monitor to be able to identify convergence.