dice_rl
dice_rl copied to clipboard
lagrangian estimation of the policy value (average reward) in the neural_dice.py
The lagrangian estimation of the policy value (average reward) in the neural_dice.py is computed as
lagrangian = nu_zero + self._norm_regularizer * self._lam + constraint
But according to the paper, I think it should be
lagrangian = nu_zero + dual_step+ self._norm_regularizer * self._lam + constraint
which includes the dual estimate.