agents icon indicating copy to clipboard operation
agents copied to clipboard

Adding New Action(s) to a Bandit Policy

Open davidcereal opened this issue 3 years ago • 3 comments

I understand from the Per-Arm Features tutorial that it may be "cumbersome to add" a new action to a policy, but what is the procedure for doing do?

For example, if I have a LinUCB agent that is trained with 5 candidate actions, but over time I'd like to add a 6th action candidate, how would I do so?

davidcereal avatar Oct 22 '21 19:10 davidcereal

Hi David, The best option you have is to add "blank" actions and enable them later. To do so:

  1. Estimate how many actions you will want to add later. For the sake of this example, let's say it's 5. Then, you define your agent with 5+5=10 actions.
  2. When initializing the agent, add the the parameter observation_and_action_constraint_splitter. This should be a function that, when presented with an observation, spits out the actual context and a binary action mask. For now the easiest thing would be:
def splitter(obs):
  return (obs, [[1, 1, 1, 1, 1, 0, 0, 0, 0, 0]])

This way only the first 5 actions will be eligible at any time. Note that the double bracket is for the batch dimension, and if you have batch_size>1 then you have to modify this output accordingly. 3. To enable a new action, just save the model variables, and initialize a new agent with those variables and a new splitter that allows the 6th action. If you want to enable actions in a timely manner, you can also add the a loop parameter in the action mask. Let me know if that helps! Gabor

bartokg avatar Oct 25 '21 08:10 bartokg

@bartokg, this makes sense! Thanks a lot.

davidcereal avatar Nov 02 '21 18:11 davidcereal

@bartokg As I understand looking into the code that for per-arm implementation of Linucb we have just single /theta to maintain whereas in linucb paper we have \theta for every arm. Can you justify the reasoning behind or point the relevant paper that supports this?

sj31867 avatar Jul 06 '23 05:07 sj31867