ExpectationMaximization.jl icon indicating copy to clipboard operation
ExpectationMaximization.jl copied to clipboard

Fitting a beta mixture fails with no method found

Open danielinteractive opened this issue 1 year ago • 10 comments

Hi @dmetivie ,

first of all thanks for this Julia package, and apologies if I make some very naive mistakes here, since this is one of my first meddling with Julia ever...

I am trying to fit a mixture of two beta distributions, but it does not find suffstats in this case:

using ExpectationMaximization

# Try to fit a beta mixture.
N = 50_000
α₁ = 10
β₁ = 5
α₂ = 5
β₂ = 10
π = 0.3

# Mixture Model of two betas.
mix_true = MixtureModel([Beta(α₁, β₁), Beta(α₂, β₂)], [π, 1 - π]) 

# Generate N samples from the mixture.
y = rand(mix_true, N)
histogram(y)

# Initial guess.
mix_guess = MixtureModel([Beta(1, 1), Beta(1, 1)], [0.5, 1 - 0.5])
test = rand(mix_guess, N)

# Fit the MLE with the EM algorithm:
mix_mle = fit_mle(mix_guess, y)
# ERROR: suffstats is not implemented for (Beta{Float64}, Vector{Float64}, Vector{Float64}).

My status:

(@v1.9) pkg> status
Status `~/.julia/environments/v1.9/Project.toml`
  [336ed68f] CSV v0.10.11
  [a93c6f00] DataFrames v1.6.1
  [31c24e10] Distributions v0.25.103
  [e1fe09cc] ExpectationMaximization v0.2.2
  [f3b207a7] StatsPlots v0.15.6
  [fce5fe82] Turing v0.29.3

danielinteractive avatar Nov 10 '23 11:11 danielinteractive

Could we add our own suffstats method to fix this? At least from https://math.stackexchange.com/questions/321952/beta-distribution-sufficient-statistic it seems this should be relatively straightforward?

danielinteractive avatar Nov 10 '23 11:11 danielinteractive

Thanks for your interest in the package.

fit_mle already is defined for Beta distribution in Distributions.jl see function` fit_mle(::Type{<:Beta}, x::AbstractArray{T};.

Error messages can be hard to read at first, in your case the REPL (terminal) complains that fit_mle(Beta, y, w) does not exist, i.e. the weighted version (where each sample in y is given a weight).

  [2] suffstats(::Type{Beta{Float64}}, ::Vector{Float64}, ::Vector{Float64})
    @ Distributions C:\Users\metivier\.julia\packages\Distributions\SUTV1\src\genericfit.jl:5
  [3] fit_mle(dt::Type{Beta{Float64}}, x::Vector{Float64}, w::Vector{Float64})

In classical EM this is what version that is used at each M steps. And in the link above, you can check that it is currently not defined (as opposed to some other distributions, like Exponential).

I don't know about fit_mle(Beta, y, w) (and suffstats). A quick search seems to say it does exist in some R packages and papers. Hence if you manage to code it, you could add it

  • PR into Distributions.jl -> open source contribution!
  • Or in a first time just defined it for yourself Then ExpectationMaximization.jl will work.

dmetivie avatar Nov 10 '23 12:11 dmetivie

Otherwise you can use the StochasticEM that I implemented (I did some elementary test but I am not sure how robust it is). In the case of StochasticEM only fit_mle(Beta, y) is needed.

using ExpectationMaximization
using Distributions
using Plots
using Random

Random.seed!(1234)

# Try to fit a beta mixture.
N = 50_000
α₁ = 10
β₁ = 5
α₂ = 5
β₂ = 10
π = 0.3

# Mixture Model of two betas.
mix_true = MixtureModel([Beta(α₁, β₁), Beta(α₂, β₂)], [π, 1 - π]) 

# Generate N samples from the mixture.
y = rand(mix_true, N)
stephist(y, label = "true distribution", norm = :pdf)

# Initial guess.
mix_guess = MixtureModel([Beta(1, 1), Beta(1, 1)], [0.5, 1 - 0.5])
test = rand(mix_guess, N)
stephist!(test, label = "guess distribution", norm = :pdf)

# Fit the MLE with the classic EM algorithm:
# mix_mle = fit_mle(mix_guess, y)
# ERROR: suffstats is not implemented for (Beta{Float64}, Vector{Float64}, Vector{Float64}).

mix_mle_SEM = fit_mle(mix_guess, y, method = StochasticEM())
fit_y = rand(mix_mle_SEM, N)
stephist!(fit_y, label = "fitMLE distribution", norm = :pdf)

it works

dmetivie avatar Nov 10 '23 12:11 dmetivie

Awesome, thanks so much @dmetivie ! The stochastic EM works perfectly fine here.

danielinteractive avatar Nov 10 '23 13:11 danielinteractive

In classical EM this is what version that is used at each M steps.

And in the link above, you can check that it is currently not defined (as opposed to some other distributions, like Exponential).

I don't know about fit_mle(Beta, y, w) (and suffstats). A quick search seems to say it does exist in some R packages and papers. Hence if you manage to code it, you could add it

We might want to look into this soon - when I searched I did not see any obvious R package implementations or papers - can you share maybe what you found?

My guess would be we could do a linearly weighted log likelihood and maximize that.

danielinteractive avatar Nov 22 '23 09:11 danielinteractive

Hi @dmetivie , so I am happy that I got the weighted likelihood maximization working for the Beta distribution. However, somehow when using that in the backend for the classic EM algorithm it does not work correctly - the Beta mixture is not fitted correctly but the 2 components are identical. Do you have an idea what is going wrong? Thanks a lot!

using Distributions
import Distributions: fit_mle, varm
using SpecialFunctions
using LinearAlgebra

# sufficient statistics - these capture everything of the data we need
struct BetaStats <: SufficientStats
    sum_log_x::Float64 # (weighted) sum of log(x)
    sum_log_1mx::Float64 # (weighted) sum of log(1 - x)
    tw::Float64 # total sample weight
    x_bar::Float64 # (weighted) mean of x
    v_bar::Float64 # (weighted) variance of x
end

function suffstats(::Type{<:Beta}, x::AbstractArray{T}, w::AbstractArray{T}) where T<:Real

    tw = 0.0
    sum_log_x = 0.0 * zero(T)
    sum_log_1mx = 0.0 * zero(T)
    x_bar = 0.0 * zero(T)

    for i in eachindex(x, w)
        @inbounds xi = x[i]
        0 < xi < 1 || throw(DomainError(xi, "samples must be larger than 0 and smaller than 1"))
        @inbounds wi = w[i]
        wi >= 0 || throw(DomainError(wi, "weights must be non-negative"))
        isfinite(wi) || throw(DomainError(wi, "weights must be finite"))
        @inbounds sum_log_x += wi * log(xi)
        @inbounds sum_log_1mx += wi * log(one(T) - xi)
        @inbounds x_bar += wi * xi
        tw += wi
    end
    sum_log_x /= tw   
    sum_log_1mx /= tw
    x_bar /= tw
    v_bar = varm(x, x_bar)

    BetaStats(sum_log_x, sum_log_1mx, tw, x_bar, v_bar)
end

# without weights we just put weight 1 for each observation
function suffstats(::Type{<:Beta}, x::AbstractArray{T}) where T<:Real
    
    w = ones(Float64, length(x))
    suffstats(Beta, x, w)

end

# generic fit function based on the sufficient statistics, on the log scale to be robust
function fit_mle(::Type{<:Beta}, ss::BetaStats;
    maxiter::Int=1000, tol::Float64=1e-14)

    # Initialization of parameters at the moment estimators (I guess)
    temp = ((ss.x_bar * (1 - ss.x_bar)) / ss.v_bar) - 1
    α₀ = ss.x_bar * temp
    β₀ = (1 - ss.x_bar) * temp

    g₁ = ss.sum_log_x
    g₂ = ss.sum_log_1mx

    θ= [log(α₀) ; log(β₀)]

    converged = false
    t=0
    while !converged && t < maxiter
        t += 1
        α = exp(θ[1])
        β = exp(θ[2])
        temp1 = digamma(α + β)
        temp2 = trigamma(α + β)
        temp3 = g₁ + temp1 - digamma(α)
        grad = [temp3 * α
                (temp1 + g₂ - digamma(β)) * β]
        hess = [((temp2 - trigamma(α)) * α + temp3) * α             temp2 * β * α
                temp2 * α * β       ((temp2 - trigamma(β)) * β + temp1 + g₂ - digamma(β)) * β  ]
        Δθ = hess\grad #newton step
        θ .-= Δθ
        converged = dot(Δθ,Δθ) < 2*tol #stopping criterion
    end

    α = exp(θ[1])
    β = exp(θ[2])
    return Beta(α, β)
end

# then define methods for the original data
fit_mle(::Type{<:Beta}, x::AbstractArray{T}; maxiter::Int=1000, tol::Float64=1e-14) where T<:Real = fit_mle(Beta, suffstats(Beta, x))
fit_mle(::Type{<:Beta}, x::AbstractArray{T}, w::AbstractArray{T}; maxiter::Int=1000, tol::Float64=1e-14) where T<:Real = fit_mle(Beta, suffstats(Beta, x, w))

# now let's try it out:

fit_mle(Beta, xtest)
fit_mle(Beta, xtest, ones(Float64, 8))
Distributions.fit(Beta, xtest)
wtest = rand(Uniform(), 8)
fit_mle(Beta, xtest, wtest)

# Now finally having this in place the classic EM algorithm should work:
using ExpectationMaximization
using Distributions
using Plots
using Random

Random.seed!(1234)

# Try to fit a beta mixture.
N = 50_000
α₁ = 10
β₁ = 5
α₂ = 5
β₂ = 10
π = 0.3

# Mixture Model of two betas.
mix_true = MixtureModel([Beta(α₁, β₁), Beta(α₂, β₂)], [π, 1 - π]) 

# Generate N samples from the mixture.
y = rand(mix_true, N)
stephist(y, label = "true distribution", norm = :pdf)

# Initial guess.
mix_guess = MixtureModel([Beta(1, 1), Beta(1, 1)], [0.5, 1 - 0.5])
test = rand(mix_guess, N)
stephist!(test, label = "guess distribution", norm = :pdf)

# Fit the MLE with the classic EM algorithm:
mix_mle = fit_mle(mix_guess, y)
fit_y = rand(mix_mle, N)
stephist!(fit_y, label = "fitMLE distribution", norm = :pdf)
# This does not seem to work? Why?

# Test first if we can downweight large values and get expected result.
test_weights = Float64.(ifelse.(y .> 0.5, 0.01, 1))
low_res = fit_mle(Beta, y, test_weights)

fit_low_res = rand(low_res, N)
stephist!(fit_low_res, label = "fit_low_res", norm = :pdf)

# Same for high values
test_weights2 = Float64.(ifelse.(y .< 0.5, 0.01, 1))
high_res = fit_mle(Beta, y, test_weights2)

fit_high_res = rand(high_res, N)
stephist!(fit_high_res, label = "fit_high_res", norm = :pdf)
# So that seems to work fine.

danielinteractive avatar Nov 23 '23 17:11 danielinteractive

Just tested again in a fresh folder, and the problem can be boiled down to this plot: image The green histogram is the fit from fit_mle() applied with a 2 components mixture and the data coming from the blue, true 2 components mixture. It is the same beta distribution twice:

MixtureModel{Beta{Float64}}(K = 2)
components[1] (prior = 0.5000): Beta{Float64}(α=2.582748230188013, β=3.3169631350299884)
components[2] (prior = 0.5000): Beta{Float64}(α=2.582748230188013, β=3.3169631350299884)

On the other hand, downweighting the observations above 0.5 a lot and running fit_mle() with a single beta distribution yields the fit_low_res i.e. pink distribution which makes sense, and vice versa for fit_high_res i.e. the light brown distribution. @dmetivie any hints are much appreciated :)

danielinteractive avatar Nov 27 '23 20:11 danielinteractive

Sorry for delay I was busy with other stuff!

What you observed is actually a common pitfall of the EM algo. With the poor mix_guess initial value the EM get stuck into some poor local maximal for the likelihood. The Stochastic version actually managed to escape that!

You can check a better initial condition like

mix_guess_2 = MixtureModel([Beta(10, 3), Beta(5, 12)], [0.3, 1 - 0.3])

converges to the true distribution.

With the kwarg infos = true you can extract the loglikelihood of the algo and select the best one for different initialization. I made a something to facilitate that, just pass a vector of mix_guess. It will choose the one with the best likelihood (+ it should suppress convergence failure, see #11 issue)

fit_mle([mix_gess, mix_guess_2], y)

I should update documentation and readme.

dmetivie avatar Nov 28 '23 10:11 dmetivie

Thanks @dmetivie , that is super helpful! In that case, I guess I could also even pass mixtures with different number of components in the vector to fit_mle()? And it will pick the one with the highest log-likelihood? (hm, although then we might overfit I guess, if we don't use BIC or something else that penalizes the number of parameters)

danielinteractive avatar Nov 28 '23 10:11 danielinteractive

Yes indeed, in that case bic aic etc should be used. One should define something like bic(::MixtureModel,y). If that is defined then it could be an option in the selection of the model in fit_mle

dmetivie avatar Nov 28 '23 12:11 dmetivie