zr-obp
zr-obp copied to clipboard
Self Normalized Estimator _estimate_round_rewards is wrong?
in SelfNormalizedInverseProbabilityWeighting._estimate_round_rewards, what is returned in the denominator is iw.mean() when in fact this should be is iw.sum(). I think this computation affects the computation of the confidence intervals for this class.
Found this issue when i found that the SNIPS estimator had unusually higher variance than the IPW estimator.
This means that _estimate_policy_value in InverseProbabilityWeighting (the base class) may need to be changed as well, since the return for that is .mean(), and there is no such normalizing constant in the definition of the SNIPS estimator.