vowpal_wabbit
vowpal_wabbit copied to clipboard
use IPS estimator for average loss of --cb_type dr
See #1696 for more details
--cb_type dr is a mode of learning, but we should enable the ability to use the IPS estimator to compute the average loss
Working on OPE docs for vowpalwabbit.org (https://github.com/VowpalWabbit/vowpalwabbit.github.io/pull/193), and I don't know the state of this issue, but allowing learning using some cb_type and reporting loss using something else like IPS would be very useful. I suspect many people just gridsearch everything and wonder why it isn't working.
Should this feature request also include the ability to use the IPS/DR estimator for evaluating the average PV loss of a policy (offline) using --cb_type mtr
in the learning algorithm?
To address this we can:
- [ ] Port Python implementations of estimators to C++ (separate lib to core)
- [ ] Change loss calculation for CB explore reductions to use estimators
- [ ] Make the choice of estimator configurable