agents
agents copied to clipboard
Adding New Action(s) to a Bandit Policy
I understand from the Per-Arm Features tutorial that it may be "cumbersome to add" a new action to a policy, but what is the procedure for doing do?
For example, if I have a LinUCB agent that is trained with 5 candidate actions, but over time I'd like to add a 6th action candidate, how would I do so?
Hi David, The best option you have is to add "blank" actions and enable them later. To do so:
- Estimate how many actions you will want to add later. For the sake of this example, let's say it's 5. Then, you define your agent with 5+5=10 actions.
- When initializing the agent, add the the parameter
observation_and_action_constraint_splitter
. This should be a function that, when presented with an observation, spits out the actual context and a binary action mask. For now the easiest thing would be:
def splitter(obs):
return (obs, [[1, 1, 1, 1, 1, 0, 0, 0, 0, 0]])
This way only the first 5 actions will be eligible at any time. Note that the double bracket is for the batch dimension, and if you have batch_size>1 then you have to modify this output accordingly. 3. To enable a new action, just save the model variables, and initialize a new agent with those variables and a new splitter that allows the 6th action. If you want to enable actions in a timely manner, you can also add the a loop parameter in the action mask. Let me know if that helps! Gabor
@bartokg, this makes sense! Thanks a lot.
@bartokg As I understand looking into the code that for per-arm implementation of Linucb we have just single /theta to maintain whereas in linucb paper we have \theta for every arm. Can you justify the reasoning behind or point the relevant paper that supports this?