Haanvid Lee issues

Repositories
Issues
Comments

Results 1 issues of


                                            Haanvid Lee

lagrangian estimation of the policy value (average reward) in the neural_dice.py

The lagrangian estimation of the policy value (average reward) in the neural_dice.py is computed as lagrangian = nu_zero + self._norm_regularizer * self._lam + constraint But according to the paper, I...