zr-obp icon indicating copy to clipboard operation
zr-obp copied to clipboard

Self Normalized Estimator _estimate_round_rewards is wrong?

Open szsb26 opened this issue 2 years ago • 0 comments

in SelfNormalizedInverseProbabilityWeighting._estimate_round_rewards, what is returned in the denominator is iw.mean() when in fact this should be is iw.sum(). I think this computation affects the computation of the confidence intervals for this class.

Found this issue when i found that the SNIPS estimator had unusually higher variance than the IPW estimator.

This means that _estimate_policy_value in InverseProbabilityWeighting (the base class) may need to be changed as well, since the return for that is .mean(), and there is no such normalizing constant in the definition of the SNIPS estimator.

szsb26 avatar Nov 28 '22 22:11 szsb26