UCB
UCB copied to clipboard
Bug in optimization of Bayesian (Variational Posterior)
First, thank you for the great work! I really like the ideas you presented with UCB, which is why I looked at your code in detail and stumbled across the following:
In your implementation of the variational posterior for the Bayesian layer, the two parameters defined for optimization (i.e. initialized with torch.nn.Parameter) are mu and rho. However, rho is only used once to initialize sigma. After that it does not have any impact on sigma and does not affect the outcome of future iterations (apart from scaling the learning rate). In other words, it is subject to optimization, but the optimized value does not have any effect on the weight distribution. I checked it explicitly, sigma stays the same throughout the whole process. That is, the resulting uncertainty is not the result of optimization but rather the result of random initialization.
I am also a little confused about the function update_lr(), so I have the following questions, hoping that I understood the code correctly:
- Why are the classifiers only subject to optimization in the first task?
- When the function is used for the adaptive learning rate update, why are the means not included in the optimization?
I hope you can help me with these concerns.
I faced the same issue too. I found out that using torch.as_tensor() is the cause of this issue by blocking the gradient flow: it is better to chenge the 2 lines:
-
log_var = w1 * torch.as_tensor(lvps).to(device).mean()
-
log_p = w2 * torch.as_tensor(lps).to(device).mean()
To the following:
-
log_var = w1 * torch.stack(lvps).to(device).mean()
-
log_p = w2 * torch.stack(lps).to(device).mean()