SossMLJ.jl
SossMLJ.jl copied to clipboard
Bayesian ElasticNet
Would be a first non/sklearn thing
- requires ability to use "product" prior (Laplace x Normal)
This would be a great example to have.
I agree this would be at least interesting, and possibly very useful. Somewhere in that interval ;)
The challenge here isn't specific to SossMLJ, or even to Soss. the density for a mixture distribution is a linear combination of the densities of its components. But here, the linear combination for the prior isn't one of densities but of log-densities. So this "just" requires building a "geometric mixture".
Is that even a thing? I have no idea. But defining this (even entirely outside Soss) would make it pretty easy to express this model.
I'm not sure what the problem is (sorry if I'm being dense), I mean product of distributions would pop up everywhere if you consider graphical models right? is the issue that it's potentially hard in general to sample from a product? or that there is no object currently to represent a product of distributions?
Got this to work:
julia> using Distributions
julia> using Soss
julia> import TransformVariables
julia> struct Elastic{T}
L1 :: T
σ :: T
end
julia> function Distributions.logpdf(α::Elastic, x)
ℓ1 = α.L1 * logpdf(Laplace(0,σ), x)
ℓ2 = (1 - α.L1) * logpdf(Normal(0,σ), x)
return ℓ1 + ℓ2
end
julia> # Just a hack to account for another hack
Base.rand(::Elastic) = 0.0
julia> Soss.xform(::Elastic, _data) = TransformVariables.asℝ
julia> mt = @model X begin
n = size(X,1)
k = size(X,2)
p ~ Uniform()
priorScale ~ HalfNormal()
noise ~ HalfNormal()
β ~ Normal(0,priorScale) |> iid(k)
yhat = X * β
y ~ For(n) do j
Normal(yhat[j], noise)
end
end;
julia> m = @model X begin
n = size(X,1)
k = size(X,2)
p ~ Uniform()
priorScale ~ HalfNormal()
noise ~ HalfNormal()
β ~ Elastic(p,priorScale) |> iid(k)
yhat = X * β
y ~ For(n) do j
Normal(yhat[j], noise)
end
end;
julia> X = rand(10,3);
julia> truth = rand(mt(X=X));
julia> post = dynamicHMC(m(X=X), (y=truth.y,));
julia> particles(post)
(priorScale = 0.825 ± 0.57, p = 0.481 ± 0.29, noise = 0.424 ± 0.12, β = Particles{Float64,1000}[0.394 ± 0.38, -0.367 ± 0.27, -0.173 ± 0.34])
The product in graphical models is a product distribution, as in tuples. Here it's a product as in multiplication. Gaussian behave nicely for this, but it's not so common. And here it's a bit weirder since we're actually doing fractional exponents.
Anyway, if you work in log space it's nice enough.
Sorry about the rand bit, that's just a hack in dynamicHMC to get a starting point. Probably a way to work around that. And there may be a nice way to sample from "geometric mean" distributions like this, but I'm not sure.
shouldn't the sigma in your logpdf be read from alpha? other than that cool!
(PS: I don't get the distinction between Normal(r; params) * Laplace(x;params) * Normal(x;params) and an undirected graphical model with two nodes but we might be talking past each other and I doubt that it's a very important point).