reinforcement_learning
reinforcement_learning copied to clipboard
Enable Exploration Distribution Generation strategy injection
Right now it is possible to run a VW model (or any i_model) that supports exploration out of the box by relying on the PMF output to represent an "exploration distribution". It is also possible to do manual exploration by specifying the PMF via the "Passthrough" mechanism.
Running a VW (or other model) and then changing the exploration distribution (to increase action diversity) is difficult: It requires setting up two live_model instances, one of which is correctly configured not to log, running the one, grabbing the output PMF, tweaking it, generating a new JSON query and running it through a "Passthrough" live_model.
It would be nice to be able to instantiate a live model with a custom exploration strategy callback which will be used to tweak the output from i_model before sampling.