reinforcement_learning icon indicating copy to clipboard operation
reinforcement_learning copied to clipboard

Enable Exploration Distribution Generation strategy injection

Open lokitoth opened this issue 4 years ago • 0 comments

Right now it is possible to run a VW model (or any i_model) that supports exploration out of the box by relying on the PMF output to represent an "exploration distribution". It is also possible to do manual exploration by specifying the PMF via the "Passthrough" mechanism.

Running a VW (or other model) and then changing the exploration distribution (to increase action diversity) is difficult: It requires setting up two live_model instances, one of which is correctly configured not to log, running the one, grabbing the output PMF, tweaking it, generating a new JSON query and running it through a "Passthrough" live_model.

It would be nice to be able to instantiate a live model with a custom exploration strategy callback which will be used to tweak the output from i_model before sampling.

lokitoth avatar Mar 04 '20 19:03 lokitoth