dice_rl icon indicating copy to clipboard operation
dice_rl copied to clipboard

lagrangian estimation of the policy value (average reward) in the neural_dice.py

Open haanvid opened this issue 4 years ago • 0 comments

The lagrangian estimation of the policy value (average reward) in the neural_dice.py is computed as

lagrangian = nu_zero + self._norm_regularizer * self._lam + constraint

But according to the paper, I think it should be

lagrangian = nu_zero + dual_step+ self._norm_regularizer * self._lam + constraint

which includes the dual estimate.

haanvid avatar Dec 09 '20 15:12 haanvid