AdvancedVI.jl icon indicating copy to clipboard operation
AdvancedVI.jl copied to clipboard

Proximal operator for the entropy of location-scale families

Open Red-Portal opened this issue 9 months ago • 6 comments

This adds the proximal operator for the entropy of location-scale families, ProximalLocationScaleEntropy, which was proposed by J. Domke[^D2020] and later theoretically and empirically analyzed by J. Domke and myself [^DGG2023][^KJWMG2023].

The use of proximal operators is to guarantee that the scale matrix is never singular, and for this it fixes the limitations of projection operators (ClipScale). Mainly, ClipScale requires an explicit lower bound on the posterior variance, which is arbitrary. Even then, if the lower bound is too loose, the algorithm may be unstable depending on the initialization and the stepsize. In fact, when I experimented with the parameter-free optimization algorithms currently provided by AdvancedVI, DoG and DoWG tend to be very aggressive in terms of stepsize, and ClipScale showed instabilities.

In the context of Turing, the combination of ProximalLocationScaleEntropy and DoWG or DoG should provide a robust tuning-free default setting for variational inference. (This is why I am working this before Turing integration.)

Proximal operators depend on the internal of the optimization algorithm in use. This is fairly straightforward for algorithms that reduce everything into a scalar stepsize like DoG and DoWG. For those who operate a vector-valued stepsize, things are less straightforward.

[^D2020]: Domke, Justin. "Provable smoothness guarantees for black-box variational inference." International Conference on Machine Learning. PMLR, 2020. [^DGG2023]: Domke, Justin, Robert Gower, and Guillaume Garrigos. "Provable convergence guarantees for black-box variational inference." Advances in neural information processing systems 36 (2023): 66289-66327. [^KJWMG2023]: Kim, Kyurae, et al. "On the convergence of black-box variational inference." Advances in Neural Information Processing Systems 36 (2023): 44615-44657.

Red-Portal avatar Mar 14 '25 20:03 Red-Portal

ah, sorry for slow response on this! I'll take a look as soon as I got some free time (probably Wednesday).

sunxd3 avatar Mar 17 '25 09:03 sunxd3

@sunxd3 @mhauru @yebai Could we move this forward?

Red-Portal avatar Mar 25 '25 20:03 Red-Portal

Oops, sorry for forgetting about this. I'll take a look tomorrow morning.

sunxd3 avatar Mar 25 '25 20:03 sunxd3

Small technical question: am I reading it correctly that AdvancedVI right now uses the linear parametrization?

sunxd3 avatar Mar 26 '25 14:03 sunxd3

Small technical question: am I reading it correctly that AdvancedVI right now uses the linear parametrization?

Yes, the default settings do, hence the involvement of ClipScale or ProximalLocationScaleEntropy, but users could implement their nonlinear parameterized location-scales if they wish to.

Red-Portal avatar Mar 28 '25 18:03 Red-Portal

Hmmm... seems like mapreduce with Zygote is broken again.

Red-Portal avatar Mar 28 '25 18:03 Red-Portal