vowpal_wabbit icon indicating copy to clipboard operation
vowpal_wabbit copied to clipboard

Specific Continuous Range of Actions based on context

Open kkchaitu27 opened this issue 2 years ago • 9 comments

Short description

In one of the wiki pages, it is given that "However sometimes the actions that can be taken might be dependent on the context. In this case, one can specify examples by listing the available actions". But this is available for discrete actions. I tried to search for the same functionality in continuous action space but could not find any.

It seems logical to extend the functionality of specific actions to a context in discrete case to continuous case. If the functionality is already available, I think this also need to be described in the wiki.

kkchaitu27 avatar May 05 '22 14:05 kkchaitu27

Hi @kkchaitu27 this functionality is not currently available and that is why it is not listed in the wiki.

Just to clarify, are you talking about allowing different continuous ranges per-action? Right now there exists support for contextual bandits with continuous action range, but you can only specify one continuous range that will be sampled from, based on the context

olgavrou avatar May 05 '22 14:05 olgavrou

In the discrete case, we can give a set of actions that are allowed for a context whereas we do not have such functionality of giving different ranges of actions for a given context. As per your words, we have a fixed range of continuous actions which is same for all contexts.

kkchaitu27 avatar May 05 '22 16:05 kkchaitu27

@olgavrou Is it difficult to implement this feature? Could you give me some date by which this can be implemented?

kkchaitu27 avatar May 09 '22 05:05 kkchaitu27

Hi @kkchaitu27 this feature implies the creation of an additional reduction for CATS, it is not the simplest of features

I will discuss with the team and get back to you. This is not in the works currently nor prioritised

olgavrou avatar May 09 '22 14:05 olgavrou

@kkchaitu27 , specifying the action dependent features for continuous actions seems nonviable since the size of an input example would be effectively infinite---you need to specify feature values for every one of a continuous set of actions.

How were you imagining this would work?

JohnLangford avatar Jun 02 '22 15:06 JohnLangford

@JohnLangford Sorry for the late reply. When we can have subset of discrete action space for a context, I don't understand why we cannot have subset of continuous action space for a context. We can put minimum and maximum of continuous action space from which subset of continuous action space is applicable for a given context.

I could not understand your point though. Could you explain me in lay man terms?

kkchaitu27 avatar Jun 15 '22 14:06 kkchaitu27

I think I understand now. You want to be able to constrain the range of actions for individual events, but not to specify the features of every action (which could never work in a continuous setting). Right? These are conflated in the discrete action setting.

There is nothing limiting the feasibility of this as far as I can tell---it's just a matter of doing. It seems difficult to manage this in logarithmic time, as per the continuous action trees, but if you are willing to swallow a one-against-all style this seems relatively easy to implement.

Would you want to do this?

JohnLangford avatar Jun 15 '22 17:06 JohnLangford

@JohnLangford I would like to clarify what I said till now with an example. Consider the following data

100:110 105:2.5:0.01 | a b c 105:120 110:1.4:0.02 | a b d 100:110 108:3.2:0.08 | a b c 102:115 103:4.2:0.05 | a d e

The first value in each row represents the continuous range of action allowed in the context and the second value is action:cost:probability_density_value and it is followed by context.

I don't know c++. I don't know how one-against-all style can be used here. It would be better if you can explain with a simple example.

kkchaitu27 avatar Jun 16 '22 12:06 kkchaitu27

The current approach builds a tree over a discretized equidistant set of actions, then created a continuous function by randomizing with a kernel. The variable interval here doesn't map neatly onto the trees fixed interval. You could of course train the tree by giving it a large loss whenever it picks something outside of the interval. Would that be adequate in your case? So predictions might exceed the interval, but if the intervals are predictable it will learn to avoid doing so?

JohnLangford avatar Jun 17 '22 18:06 JohnLangford