HypothesisTests.jl icon indicating copy to clipboard operation
HypothesisTests.jl copied to clipboard

Classical Hypothesis Tests: Wald, LR, LM

Open azev77 opened this issue 4 years ago • 17 comments

@nosferican mentioned this in #121 also on discourse... I coded up some classical hypothesis tests (HT). I need help w/ the score test. Does anyone know how to obtain the loglik functions after glm?

using DataFrames, GLM, Distributions

n=7; k=1; X=randn(n,k); β=[2 ;-4]; σ_e = 10.1;
Y= [ones(n) X]*β + σ_e*randn(n);


d = DataFrame(X1= X[:,1], Y=Y);
#d = DataFrame(X1= X[:,1], X2= X[:,2], Y=Y);
#
f0 = @formula(Y ~ 1)
fA = @formula(Y ~ 1 + X1)
#
mH0 = lm(f0, d)
mHA = lm(fA, d)
#
mH0 = glm(f0, d, Normal())
mHA = glm(fA, d, Normal())

# Likelihood Ratio Test
function HT_LR(mH0, mHA)
    LR = 2.0*(loglikelihood(mHA) - loglikelihood(mH0))  |> abs
    df = dof_residual(mH0) - dof_residual(mHA)          |> abs
    pv = 1 - cdf(Chisq(df), LR)
    return LR, pv, df
end
#
HT_LR(mH0, mHA) #better practice
HT_LR(mHA, mH0)
# Wald Test
"https://en.wikipedia.org/wiki/Wald_test"
"H0: R*θ = r"
"HA: R*θ ≂̸ r"
#
function HT_Wald(mHA, R, r, V)
    θ̂ = coef(mHA)
    A = (R*θ̂ - r)
    #V = vcov(mHA)  #[2,2]
    n = size(mHA.mm.m,1)
    W = A' * inv(R*V*R') * A   #V/n
    df = size(R,1)
    pv = 1 - cdf(Chisq(df), W)
    return W, pv, df
end
R = [0 1]
r = [0]
HT_Wald(mHA, R, r, vcov(mHA))

"LM aka Score test Matlab code"
G = gradp(@nll_lin,theta_r_full,datamat,1);
H_r = HessMp(@nll_lin,theta_r_full,datamat,1);
V_r = inv(H_r);
LM = G*V_r*G';
LM_p = 1-chi2cdf(LM,2);


azev77 avatar May 16 '20 06:05 azev77

Maybe something for https://discourse.julialang.org/ ?

mschauer avatar May 16 '20 06:05 mschauer

Top! https://discourse.julialang.org/t/classical-hypothesis-tests-wald-lr-lm/39575

mschauer avatar May 16 '20 07:05 mschauer

@mschauer I posted this as sample code in case someone is working on a PR. Are there plans to include Wald/LR/LM in HypothesisTests.jl?

azev77 avatar May 16 '20 07:05 azev77

See https://github.com/JuliaStats/StatsModels.jl/pull/162 for the likelihood ratio test. I think other tests for models should go to StatsModels or GLM rather than HypothesisTests, as the latter shouldn't depend on other packages.

nalimilan avatar May 16 '20 17:05 nalimilan

I agree. HypothesisTests.jl shouldn't depend on other packages. My code for LR test is for any estimated models which work w/ loglikelihood(m). The only thing it depends on is cdf(Chisq(df), LR).

# Likelihood Ratio Test
function HT_LR(mH0, mHA)
    LR = 2.0*(loglikelihood(mHA) - loglikelihood(mH0))  |> abs
    df = dof_residual(mH0) - dof_residual(mHA)          |> abs
    pv = 1 - cdf(Chisq(df), LR)
    return LR, pv, df
end

azev77 avatar May 16 '20 17:05 azev77

I think a trinity tests package would be nice. I would be happy to contribute to this in the future.

PS i whink it would also be nice to have a @formula syntax to make this, such as @formula(x =0, y = 0) and make a matrix of restrictions base on that.

pdeffebach avatar May 16 '20 19:05 pdeffebach

Aye. I have code for Wald testing and there is the hypothesis contrasts at StatsModels that could be expanded. We should work on specifying linear combinations and hypothesis tests based on those.

Nosferican avatar May 16 '20 21:05 Nosferican

Reopen, it makes sense to discuss this here

mschauer avatar May 17 '20 00:05 mschauer

@pdeffebach I'd prefer the classic trinity tests be in this repo.

Can we return to the conversation about where hypothesis tests should live in Julia? To me the most obvious & natural place is in a repo called "HypothesisTests.jl". Just like most distributions are in "Distributions.jl". If not, then what's the purpose of "HypothesisTests.jl"?

I feel like keeping as many hypothesis tests as possible in this repo will encourage others to contribute. It would certainly make me more likely to wanna contribute.

I don't see why dependencies are an necessarily an issue. I wrote code for an LRT that doesn't depend on anything (except Chiqsuared distribution).

azev77 avatar May 17 '20 05:05 azev77

Our plan was to move all model-related functions to anew StatsModelsBase package, which would contain the essential parts of the currents StatsModels. In that case, HypothesisTests would have to depend on StatsModelsBase to be able to call deviance and co. (In the long term StatsBase is supposed to go away in favor of Statistics + StatsModelsBase and possibly others.) Anyway I don't see the problem with having these functions in one package or another. Anyway you'll have to read the documentation for StatsModels to learn how to fit models.

Cc: @kleinschmidt

nalimilan avatar May 17 '20 13:05 nalimilan

@nalimilan this is interesting, I wanna learn more.

One advantage of having Hyp Tests together (when possible) is it makes it easier for the user to discover & compare. There are often many different tests one can use for a given hypothesis. Each has it's own advantages. For example, Wald has better power than Score, but Score has better size than Wald. Keeping tests together makes it easier to discover, compute each tests power/size etc.

Mathematica has several goodness of fit tests. You can automatically compare results:

image

Furthermore, they can automate selecting the optimal test for a given dataset DistributionFitTest[data,dist,Automatic] will choose the most powerful test that applies to data and dist for a general alternative hypothesis.

I feel like these things are easier to cooperate on if we work together in the same repo, whichever repo is decided on...

azev77 avatar May 19 '20 18:05 azev77

Yes but they could live in StatsModels.

nalimilan avatar May 20 '20 08:05 nalimilan

I don't have much of an opinion one way or another. I suspect that even a trimmed down StatsModelsBase would still contain more than just the function definitions, so I could see being less willing to take it on as a dependency here. But on the other hand I'm sympathetic to the desire to keep hypothesis tests in one place, and this package is the natural place for them.

kleinschmidt avatar May 20 '20 17:05 kleinschmidt

@nalimilan @kleinschmidt The score (LM) test requires the score function & informationmatrix of a model. I've been looking around JuliaStats & found: StatsBase.informationmatrix(mHA) StatsBase.score(mHA) They currently don't work w/ GLM. Will these eventually be able to return score/info for any appropriate fitted model?

azev77 avatar May 20 '20 18:05 azev77

I added those to the API as those are necessary components for robust variance-covariance estimators vcov(model; ...). GLM just needs to implement those. I believe Econometrics.jl does implement those.

Nosferican avatar May 20 '20 18:05 Nosferican

StatsModels 0.6.12 has just been released which contains the LR test (from JuliaStats/StatsModels.jl#162)

kleinschmidt avatar Jul 16 '20 17:07 kleinschmidt

My field uses a lot of likelihood ratio and related test statistics, Wald etc. I wonder if it's worth splitting a package out for that, for reference, see Chapter 3 of: https://arxiv.org/pdf/1007.1727.pdf

Essentially, given a Likelihood function L(θs::Vector{Float64})::Float64, our likelihood ratio would be a 1D function of parameter of interest (POI), call it μ, and assume it's the first among original θs:

const global_MLL = L(MLE_θs)

function LR(μ::Float64)
    other_θs = # conditional estimator while fixing POI μ
    L(vcat(μ, other_θs)) / global_MLL
end

then we develop test statistics t_0 and t_μ and obtain p-value and interval by integrating the value of test statistics etc.

Moelf avatar May 11 '22 19:05 Moelf