vowpal_wabbit icon indicating copy to clipboard operation
vowpal_wabbit copied to clipboard

use IPS estimator for average loss of --cb_type dr

Open marco-rossi29 opened this issue 5 years ago • 3 comments

See #1696 for more details

--cb_type dr is a mode of learning, but we should enable the ability to use the IPS estimator to compute the average loss

marco-rossi29 avatar Jan 14 '20 16:01 marco-rossi29

Working on OPE docs for vowpalwabbit.org (https://github.com/VowpalWabbit/vowpalwabbit.github.io/pull/193), and I don't know the state of this issue, but allowing learning using some cb_type and reporting loss using something else like IPS would be very useful. I suspect many people just gridsearch everything and wonder why it isn't working.

maxpagels avatar Apr 05 '21 14:04 maxpagels

Should this feature request also include the ability to use the IPS/DR estimator for evaluating the average PV loss of a policy (offline) using --cb_type mtr in the learning algorithm?

rangi513 avatar May 14 '21 19:05 rangi513

To address this we can:

  • [ ] Port Python implementations of estimators to C++ (separate lib to core)
  • [ ] Change loss calculation for CB explore reductions to use estimators
  • [ ] Make the choice of estimator configurable

jackgerrits avatar Dec 01 '22 21:12 jackgerrits