ioc
ioc copied to clipboard
Problems with Interest Function Gradient Update in Four Rooms (tabular)
There are 2 problems with the function update()
of the class InterestFunctionGradient
inside ioc/tabular/interestoptioncritic_tabular_fr.py
.
First problem:
In update()
, it is supposed to calculate the gradient with respect to the options' weights. However, when calculating gradnormalizer
, some of the partial derivatives are taken with respect to other options' weights. I think all the partial derivatives should be taken with respect to the current option's weight only, which means that all the terms in gradnormalizer
should be zero except for the term that includes the interest function of the current option, because all the other interest functions are independent of the current option's weight.
Second problem:
Gradient is a vector quantity, which means that all weights should be updated. And since the final probability for choosing an option depends on all weights, all weights should be updated. However, in update()
only the current option's weight is updated.
I'll illustrate these two problems with a simple example:
Suppose there is only 1 state and only 2 option.
Let the weight of option 0 be w_0
and the weight of option 1 be w_1
. Suppose option 0 is the current option.
The final probability for choosing option 0 will be pi_omega(0)*expit(w_0)/[pi_omega(0)*expit(w_0)+pi_omega(1)*expit(w_1)]
.
In the paper, gradient of this probability is taken with respect to z
. In this example, z
will be [w_0, w_1]
.
Since this probability depends on both w_0
and w_1
, the gradient of this probability with respect to z
will be a vector with non-zero components.
However, in update()
, only the first component of the gradient vector is considered (Second problem).
Furthermore, the first component of the vector should be the partial derivative of this probability with respect to w_0
. However, in the code, some terms in gradnormalizer
are taken with respect to w_1
(First problem).
Is there a misunderstanding or is this really a problem?
We believe there is a misunderstanding here.
In fact both problems indicated seem to stem from the way things are written in the code. The calculation of gradnormalizer
is correct in that all the partial derivatives are indeed taken with respect to the current option's weight only. The confusion stems from the fact that we refer to z as the parameter which incorporates weights of the specific option [z_o]
when that option weight is being updated as opposed to the whole vector [z_o, z_1]
of weights being updated. We would like to point out that this notation has been used previously in the Option-Critic paper and subsequent work (it can also be seen in the way the termination function is updated).
Hope this clarifies the confusion. This is also reflected in the second problem you indicated, the update should ideally only update the current option weights in the call-and-return implementation, which is indeed what happens in our implementation as well. Moreover, even though the gradient is vector quantity, there is no weight sharing between states in the tabular case and therefore it is effectively a scalar (i.e. zeros everywhere except at the state where the update is being made).
Additionally, we would recommend looking at the derivation in the appendix sections A.2.1 and A.2.5 as well to clarify this confusion.
The second problem is clear now. However, I still couldn't quite figure out the first problem. In the code, gradlist1 = [self.interest_functions[opt].pmf(phi)*(1-self.interest_functions[opt].pmf(phi)) for opt in range(self.noptions)]
. It seems to me that gradlist1
should be a list of the gradients of all the interest functions with respect to the current option's weight, i.e. [d(I_(w_0),(z_0))/d(z_o), d(I_(w_1),(z_1))/d(z_o), ...]
. Each interest functions should have their own z
. Why isn't gradlist1
all zero except for the index for the current option, since all the other interest functions aren't parameterized by the current option weight z_o
?
Also could you please clarify all the w
and z
in the gradnormalizer
in the final equation in appendix section A.2.5?
For example, I was wondering if the z
in I_w,z(s)
is the same as the z
the gradient is taken with respect to.
Since I think the confusion stems from this equation.
Hi Apologies for the delay, We have somehow missed to revert back to you on this. Was this by any chance resolved? I believe that you are onto something important here. We will be happy to take a look at this more detail if you think this is still an issue. Please let us know.