contextual icon indicating copy to clipboard operation
contextual copied to clipboard

Save predicted reward for chosen arm (feature request)

Open pstansell opened this issue 5 years ago • 2 comments

Hello Robin,

This is a feature request, not a bug report.

I'd like the output from history$get_data_table() to include a column for the predicted values of the chosen arms at each step.

For example, for EpsilonGreedyPolicy it would just be self$theta$mean[[chosen_arm]], which I realise is available by setting save_theta = TRUE in Simulator$new. If I also set save_context = TRUE the predicted value of the chosen action can be obtained. (Although I have to take into account the fact that the theta values are one time step ahead of the values for the current context-arm pair since they have been updated with the reward from the current context-arm pair. That is, the theta values do not hold the predicted values for the current context-arm pair since they hold the values computed after the reward for the current context-arm pair is known.)

With other policies, such as ContextualEpsilonGreedyPolicy, using the output from history$get_data_table() to compute the expected reward for the current action before it is taken is not so straightforward. I see in policy_cmab_lin_epsilon_greedy.R that you compute expected_rewards[arm], but you don't seem to save the values for output later on. It is exactly expected_rewards[arm] that I would like history$get_data_table() to include in its output. Having expected_rewards[arm] for just the chosen arm would be enough for my current needs, but maybe having expected_rewards[arm] for all arms would be useful in future.

I had a look at history.R to see if I could work out how to save the values of expected_rewards, but it looks rather complicated to me and my R is nowhere near as good as yours :-).

Thanks,

Paul

pstansell avatar Dec 25 '19 12:12 pstansell

Hi @pstansell - first of all, my apologies for my late reply! Is this issue still of relevance to you?

robinvanemden avatar Mar 04 '20 15:03 robinvanemden

Hello Robin,

Yes, this issue is still relevant to me. The reason I'd like the predicted values of each arm is so that I can rank the arms in order of their predicted values. I'd like to rank the arms before a particular arm is chosen.

Thanks,

Paul

pstansell avatar Mar 05 '20 11:03 pstansell