LsqFit.jl icon indicating copy to clipboard operation
LsqFit.jl copied to clipboard

likelihood fitting (possible super-useful feature) ?

Open mmikhasenko opened this issue 5 years ago • 13 comments
trafficstars

Often I need to minimize a likelihood function

llh(model, data, p) = - sum(log, model(data,p)) # assuming that the model is normalized

I like very much this package and the easy API. So it would be great if I can just run

fr = llh_fit(model, data, initial_p)

First, it seemed to me that the extension is straightforward, by just replacing the function https://github.com/JuliaNLSolvers/LsqFit.jl/blob/master/src/curve_fit.jl#L46 with the llh_fit.

Unfortunately, I quickly realized that taking abs2 of residuals is a part of levenberg_marquardt. Are there any possibilities for the extension?

mmikhasenko avatar Apr 04 '20 19:04 mmikhasenko

I found that Optim can do it

f(p) = -sum(x->(x>0) ? log(x) : -Inf, notmalized_fit_function(data, p))
pfr = Optim.minimizer(Optim.optimize(f, init_pars, BFGS(),
               Optim.Options(show_trace = true); autodiff = :forwarddiff))

perhaps, just API for error analysis would be nice to unify between the packages.

(can be closed, I guess)

mmikhasenko avatar Apr 06 '20 11:04 mmikhasenko

Could you do

llh(model) = (data,p) -> - sum(log, model(data,p))
result = curve_fit(llh(model), data, initial_p)

?

Magalame avatar Jun 23 '20 05:06 Magalame

Could you do

llh(model) = (data,p) -> - sum(log, model(data,p))
result = curve_fit(llh(model), data, initial_p)

?

@Magalame, that would be nice. Do you suggest an interface to implement or say that it should already work with LsqFit? Do not you assume curve_fit(model, x, y, p0) signature? I would need y with the same size as x. I do not see how to fake it for the likelihood fit in the current interface.

Let me know please if I misunderstood something.

mmikhasenko avatar Sep 27 '20 21:09 mmikhasenko

just checked that x vector can have a different length then y vector

using LsqFit

@. model(x, p) = p[1]*exp(-x*p[2])

xdata = range(0, stop=10, length=20)
ydata = model(xdata, [1.0 2.0]) + 0.01*randn(length(xdata))
p0 = [0.5, 0.5]

fit1 = curve_fit(model, xdata, ydata, p0)
fit1.param
sigma = stderror(fit1)

model_chi2(xv, p) = [sqrt.(sum(abs2, model(xdata, p) - ydata))]
fit2 = curve_fit(model_chi2, xdata, [0], p0)
fit2.param

The second fit gives slightly different values of parameters, but close to the right ones.

Still not clear to me how to fake the llh fit.

mmikhasenko avatar Sep 28 '20 09:09 mmikhasenko

The second fit gives slightly different values of parameters, but close to the right ones.

Still not clear to me how to fake the llh fit.

If you take a closer look at the result of fit2, the residual vector only has a single component. I don't really get why you define model_chi2 in the way that you do. Since the square root is monotone increasing, you can omit it and the minimization problem will still be equivalent.

It works if you do it like this:

model_chi2_fixed(xdata,p) = model(xdata,p) - ydata
fit3 = curve_fit(model_chi2_fixed, xdata, zeros(length(xdata)), p0)
fit3.param == fit1.param

Then again, its not hard to see that this is precisely equivalent to how your fit1 works. If you assume that the uncertainties in ydata are normally distributed (which you typically do, although you have not specified any uncertainties in this toy example), then maximizing the likelihood is equivalent to minimizing chi squared, i.e. minimizing the squares of the residuals, which is what LsqFit.jl does anyway.

You might also be interested in InformationGeometry.jl which performs maximum likelihood estimation out of the box for you and uses LsqFit.jl under the hood.

RafaelArutjunjan avatar Oct 24 '20 20:10 RafaelArutjunjan

thanks for checking. The example with fit1 and fit2 somehow got away from the original issue. For the LLH minimization, we do not have y vector,

I thought that LsqFit does not have an interface to be called without y as @Magalame suggested. The difference to least-square: it does not subtract y, one does not square the difference.

Now, I see in your code

curve_fit(DS,model,start; tol=tol).param

Hm, does not seem to use y vector. Could you confirm that it is doing exactly what we need for the LLH?

mmikhasenko avatar Oct 25 '20 19:10 mmikhasenko

Hm, does not seem to use y vector. Could you confirm that it is doing exactly what we need for the LLH?

Yes, it's correct. The data (i.e. xdata, ydata and covariance) are inside the DS object and I overloaded the curve_fit method for convenience such that it will accept the types I defined in InformationGeometry.jl.

But to reiterate: as long as your measurements are normally distributed around the true model (i.e. your error bars around the y-values are symmetric and represent the 1 sigma standard deviation), what LsqFit.jl is doing is already exactly equivalent to maximizing the loglikelihood.

In other words, if you use LsqFit.jl as it was intended (like you did in your example with fit1) then it already does what you want, namely what you call "likelihood fitting".

RafaelArutjunjan avatar Oct 25 '20 19:10 RafaelArutjunjan

Let's check if we are talking about different things.

  • binned LLH, vs
  • unbinned LLH

With your last passage on exactly equivalent, it seems to me that you have in mind binned likelihood fit, i.e.

model(x,mu,sigma) = 1/sqrt(2*pi)/sigma * exp(-(x-mu)^2/2/sigma^2)
#
sample = randn(10_000)
bins = range(-2,2,length=100)
h = fit(Histogram(), sample, bins)
y = h.weights

whatever you do for y (LsqFit or LLH) you do not gain match (as you say, equivalent)

What I'm looking for is the minimization of

llh(mu,sigma) = - sum(log, model.(sample, mu,sigma))

It has a higher statistical power.

EDIT: log was missing in the sum

mmikhasenko avatar Oct 25 '20 20:10 mmikhasenko

Let's check if we are talking about different things. With your last passage on exactly equivalent, it seems to me that you have in mind binned likelihood fit

No. We are both talking about the same thing, i.e. likelihoods which are continuous (differentiable, actually) with respect to what you denoted with "x".

model(x,mu,sigma) = 1/sqrt(2*pi)/sigma * exp(-(x-mu)^2/2/sigma^2)

What I'm looking for is the minimization of

llh(mu,sigma) = - sum(model.(sample, mu,sigma))

I'm quite certain that you are looking to maximize

loglikelihood(mu,sigma) = - sum(log.(model.(sample, mu,sigma)))

where you are missing the log given that your definition of model still has the exponential in it. Then maximizing the loglikelihood corresponds to minimizing all the (x-mu)^2/sigma^2 terms in the sum which is called "least squares" fitting. (This is also where LsqFit.jl gets its name.)

RafaelArutjunjan avatar Oct 25 '20 22:10 RafaelArutjunjan

that is right. log was missing.

Hm, I'm confused by what you said.

llh(mu, sigma) = -sum(x->(x-mu)^2/sigma^2, sample) + length(sample)*log(sigma)

is not equivalent to minimizing

lsq(mu, sigma) = @. (model(bincenters, mu, sigma) - y)^2

The first is unbinned LLH the second is Lsq.

Do you agree with that?

Do you know is the first one can be called using the package?

mmikhasenko avatar Oct 25 '20 22:10 mmikhasenko

llh(mu, sigma) = -sum(x->(x-mu)^2/sigma^2, sample) + length(sample)*log(sigma)

is not equivalent to minimizing

lsq(mu, sigma) = @. (model(bins, mu, sigma) - y)^2

The first is unbinned LLH the second is Lsq.

Do you agree with that?

No, imo the least squares term should also be divided by sigma, then llhand lsq only differ by a constant.

You get the llh maximization if you do:

curve_fit(model, x, y, sigma.^-2, p0)

You get the "binned" lsq if you just leave out the vector of standard deviations sigma. Then LsqFit.jl will just assume that all data points are weighted the same, which is the same as just choosing sigma = ones(length(y)) in the loglikelihood.

RafaelArutjunjan avatar Oct 25 '20 22:10 RafaelArutjunjan

Mm,

  1. think of estimating both, mu and sigma
  2. in the lsq, the sigma is not the correct error for the residual of y. The sigma is the error of x...

mmikhasenko avatar Oct 26 '20 08:10 mmikhasenko

do we still not have chi2 fit?

Moelf avatar Apr 01 '22 13:04 Moelf

I don't think this is in scope of this package. You can get your maximum likelihood estimates if you assume gaussian errors, otherwise use Optim to get the maximum likelihood estimates and do hypothesis testing using another package or your own code. This package is specifically for non-linear least squares.

pkofod avatar Sep 03 '22 19:09 pkofod