GLM.jl icon indicating copy to clipboard operation
GLM.jl copied to clipboard

Vector Generalized Linear and Additive Models (Support for Multinomial)

Open Nosferican opened this issue 8 years ago • 27 comments

Looking at the common generalized linear models, the binomial distribution is implemented, but the multinomial (its general form) is not. It also seems that GLM.LinPred is constrained to be a vector. The multinomial form for GLM is here. I would think it would be a good addition to the package.

Nosferican avatar Dec 15 '17 23:12 Nosferican

Here is a draft that I am working on. If I can get the last kinks solved I could port it to GLM. Any help with the last kinks would be appreciated. https://gist.github.com/Nosferican/54727b20f870894a15ecfb28e45cc4bc

Nosferican avatar Apr 30 '18 22:04 Nosferican

@Nosferican what are the last kinks? I'm learning Julia, and have great interest in GLMs and their generalization, so I'd love to help out if my speed doesn't interfere with your schedule

hung-q-ngo avatar May 07 '18 21:05 hung-q-ngo

I updated the gist... basically I need to check how the variance covariance is computed for the mlogit case.

Nosferican avatar May 07 '18 21:05 Nosferican

@Nosferican juding from your comment here, ordinal LR is not yet supported, right? Because you mentioned something about isordered for LogitLink which I couldn't parse.

hung-q-ngo avatar May 07 '18 22:05 hung-q-ngo

That's correct. I haven't gotten to ordinal logistic regression yet. I decided to bundle the ones I have and implement ologit later on (with GMM instrumental variables for non-linear models). Getting my third chapter stuff together for proposal defense (2SLS, absorb / within / fixed effects, between, first difference, random effects for panel data, etc.)

Nosferican avatar May 07 '18 22:05 Nosferican

Fixed the vcov issue so now will be working on getting everything together. Hopefully I can get a beta in the nearby future. Will update the gist with the latest code for multinomial which can be used to port it to GLM.

Nosferican avatar May 09 '18 23:05 Nosferican

@Nosferican : i've read enough (about VGLM & Julia) to understand your code. Is the one in gist the latest version? Do you need help with anything else? In addition to multinomial GLM, is multivariate Gaussian also a use-case / test-case?

hung-q-ngo avatar May 14 '18 21:05 hung-q-ngo

I finished pinning down the last few things finished writing the chapter for my dissertation on it so now is all bundling everything together. For the multivariate Gaussian question, for vglm basically one can specify a different distribution and link for each response / linear predictor in the most general case. I think it could be a sensible default to have it dispatch on the response type. The ones I did where the most used cases, but once it gets ported here one could make the general set up for other less common cases.

Nosferican avatar May 14 '18 21:05 Nosferican

@Nosferican : any update on your vector GLM implementation? I am thinking of working on ordinal regression and that would benefit from VGLM tremendously

hung-q-ngo avatar Jun 22 '18 00:06 hung-q-ngo

Hey, so I have the code working on alpha. CategoricalArrays had an issue which was patched yesterday and this morning StatsModels released a patch to support 0.7. I ran the code I had with the tagged versions and everything is working. I still need to clean up the API and add the other components, but hopefully I will get to it this weekend (depending on how much time I have). If you wanna contribute a ologit implementation that would be welcomed. I can take the code and put it in the same framework. Will keep you updated.

Nosferican avatar Jun 22 '18 01:06 Nosferican

I just committed a first draft of the components for Vectorized GLM you can take a look at. Any comments are appreciated. I will keep brining all the code I have developed together and compatible with the latest version. https://github.com/JuliaEconometrics/Econometrics.jl/blob/master/src/GeneralizedLinearModels.jl

Nosferican avatar Jun 22 '18 21:06 Nosferican

Just to keep w/the updates. You should be able to start using it and identifying some issues for multinomial logistic regression now... documentation and tests coming soon... You can run it using nightly (you might need to checkout Distributions ] add Distributions#aa/0.7)

] add https://github.com/JuliaEconometrics/Econometrics.jl
using CSV, DataFrames, StatsBase, StatsModels, Econometrics
data = CSV.read("filename_to_test.csv"); # outcome variable should be `AbstractCategoricalVector`
formula = @formula(outcome ~ exogenous_variables);
model = EconometricsModel(formula, data);
coeftable(model)

Linear and Poisson are working as well... The whole StatsBase API is using basic rules... I will re-work these after merging the panel data estimators.

Nosferican avatar Jul 02 '18 21:07 Nosferican

got it. I'm busy with things at work for a few days, will get back to reading/trying this out as soon as I can. Thanks for the update.

hung-q-ngo avatar Jul 02 '18 22:07 hung-q-ngo

Hi there, I'm interested in taking this for a spin in Julia 0.7.

I run this code:

using DataFrames
using StatsBase
using StatsModels
using Econometrics

And get this error:

[ Info: Precompiling Econometrics [3a2a89cb-daa6-4aaa-96ef-7853daeb1b7c]
┌ Warning: Package Econometrics does not have DataFrames in its dependencies:
│ - If you have Econometrics checked out for development and have
│   added DataFrames as a dependency but haven't updated your primary
│   environment's manifest file, try `Pkg.resolve()`.
│ - Otherwise you may need to report an issue with Econometrics
└ Loading DataFrames into Econometrics from project dependency, future warnings for Econometrics are suppressed.
WARNING: Method definition stderror(StatsBase.StatisticalModel) in module StatsBase at /home/jock/.julia/packages/StatsBase/NzjNi/src/statmodels.jl:125 overwritten in module Econometrics at /home/jock/.julia/packages/Econometrics/y4Nin/src/GeneralizedLinearModels.jl:603.
ERROR: LoadError: LoadError: syntax: invalid assignment location "model_distribution(model) <: Multinomial && varlist[:response]"
Stacktrace:
 [1] include at ./boot.jl:317 [inlined]
 [2] include_relative(::Module, ::String) at ./loading.jl:1038
 [3] _broadcast_getindex at ./sysimg.jl:29 [inlined]
 [4] #17 at ./broadcast.jl:922 [inlined]
 [5] ntuple at ./tuple.jl:158 [inlined]
 [6] tuplebroadcast at ./broadcast.jl:922 [inlined]
 [7] copy at ./broadcast.jl:920 [inlined]
 [8] materialize(::Base.Broadcast.Broadcasted{Base.Broadcast.Style{Tuple},Nothing,typeof(Econometrics.include),Tuple{Tuple{String,String,String}}}) at ./broadcast.jl:724
 [9] top-level scope at none:0
 [10] include at ./boot.jl:317 [inlined]
 [11] include_relative(::Module, ::String) at ./loading.jl:1038
 [12] include(::Module, ::String) at ./sysimg.jl:29
 [13] top-level scope at none:2
 [14] eval at ./boot.jl:319 [inlined]
 [15] eval(::Expr) at ./client.jl:399
 [16] top-level scope at ./none:3
in expression starting at /home/jock/.julia/packages/Econometrics/y4Nin/src/EconometricsModel.jl:1
in expression starting at /home/jock/.julia/packages/Econometrics/y4Nin/src/Econometrics.jl:33

Any ideas?

JockLawrie avatar Sep 13 '18 06:09 JockLawrie

stderr became Base from the (I/O) connections standpoint (it used to be upper case). StatsBase changed to stderror. I can probably fix it in a few minutes to bring it to 1.0 compatibility. Will ping you in a bit.

Nosferican avatar Sep 13 '18 06:09 Nosferican

Great, thanks (and thanks for the speedy response!)

JockLawrie avatar Sep 13 '18 06:09 JockLawrie

Try it now (use v"1.0.0", but v"0.7" will work too)

]rm Econometrics
]add https://github.com/JuliaEconometrics/Econometrics.jl#master
using Econometrics

Is still WIP so definitely not production ready, but feedback would be great. I am reexporting StatsBase, DataFrames, and StatsModels so no need to using those anymore. You can use it by

model = EconometricsModel(::Formula, ::AbstractDataFrame;
                          contrasts::Dict{Symbol,AbstractContrasts} =
                              Dict{Symbol,AbstractContrasts}())
coeftable(model) # Most of the StatsBase API is implemented

For multinomial logistic regression, just make sure the response is AbstractCategoricalVector (it can be Union with Missing), (i.e., categorical!(data, response))

Nosferican avatar Sep 13 '18 08:09 Nosferican

Thanks, that works. I've posted some findings over at Econometrics.jl#1

JockLawrie avatar Sep 13 '18 12:09 JockLawrie

Where is the associated code? I don't see an Econometrics package under https://github.com/JuliaEconometrics.

RossBoylan avatar Apr 18 '19 16:04 RossBoylan

For now, a draft is at, https://github.com/Nosferican/Econometrics.jl. Waiting for StatsModels to tag a release and should be releasing a beta soon after.

Nosferican avatar Apr 18 '19 17:04 Nosferican

Are there plans to move ordinal/multinomial functionality into GLM.jl or is it staying in Econometrics?

Tokazama avatar Dec 11 '19 12:12 Tokazama

I don't know if there are plans to back-port those. For mlogit, GLM would have to refactor some of the code for allowing VGLM. For ologit (polr), I am would have to see if I can finally get the analytical solution to the Hessian or it would need to introduce a dependency on some solver (e.g., Optim / NLSolver) for the Hessian.

Nosferican avatar Dec 11 '19 15:12 Nosferican

Is the analytic solution to the Hessian the internal issue or is it related to parsing the categorical variables in the formula? If it's a formula related thing this may be worth pursuing because we should really have reasonably consistent behavior from packages using StatsModels

Tokazama avatar Dec 11 '19 15:12 Tokazama

It's just a beast of an analytical solution... I gave up last time after a month or trying to implement it everyday. I found yet another dissertation that has the closed for solution so I will give it a shot again when I have time this month.

Nosferican avatar Dec 11 '19 16:12 Nosferican

I'm definitely not the one who should be implementing it but feel free to ping me when you need someone to test it or review code.

Tokazama avatar Dec 11 '19 17:12 Tokazama

It's just a beast of an analytical solution... I gave up last time after a month or trying to implement it everyday. I found yet another dissertation that has the closed for solution so I will give it a shot again when I have time this month.

@Nosferican Any chance you still have the reference handy? I'm not sure if I will take a crack at this, but at interested in getting an idea of how difficult it might be.

frankier avatar Nov 27 '23 08:11 frankier