MixedModels.jl
MixedModels.jl copied to clipboard
Finding your way around for R users
In private email Martin Maechler asked me several questions about using Julia and the MixedModels package. Because the answers to which may be of more general interest, I created this issue for them so that others may see the answers and comment on them.
I am assuming that a recent version of Julia 0.2 devel is installed. The easiest way to do that for Ubuntu users is using the julianightlies PPA and the julia-deps PPA. As of today, the "nightlies" version produces
$ /usr/bin/julia
_
_ _ _(_)_ | A fresh approach to technical computing
(_) | (_) (_) | Documentation: http://docs.julialang.org
_ _ _| |_ __ _ | Type "help()" to list help topics
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 0.2.0+nightly20130502.9a72fab
_/ |\__'_|_|_|\__'_| | -Linux-x86_64 (2013-05-03 19:14:05)
|__/ |
on startup. There have been too many changes since the 0.1.2 release to try using that. Please ensure that your version string starts with 0.2.0
The system for attaching packages has changed in version 0.2.0. To install this package use
Pkg.add("MixedModels")
After that, attaching this and other packages is done with the using
directive.
using MixedModels
More than one package can be listed. For testing I often use
using RDatasets, MixedModels
This directive doesn't return instantaneously but the delay is due to the size and complexity of the DataFrames
package upon which both of these depend.
As in R
, attaching a package also attaches its dependencies but there are distinctions between those packages that are visible from the REPL and those that are visible only inside the package listed in using
. The equivalent of R's ls()
is whos()
and it can take an argument which is the name of a Module. (Most packages are organized as a Module - which is a namespace - but you can have Modules that are not separate packages).
julia> using MixedModels
julia> whos()
Base Module
Core Module
DataFrames Module
Distributions Module
GZip Module
Main Module
MixedModels Module
NLopt Module
OptionsMod Module
Stats Module
julia> whos(MixedModels)
LMMsimple DataType
LMMsplit DataType
MixedModel DataType
MixedModels Module
SparseMatrixRSC DataType
VarCorr Function
fixef Function
ranef Function
reml Function
It happens that MixedModels
attaches the DataFrames
and Distributions
namespaces in the REPL as well as its own namespace. However is does not the NLopt
namespace in the REPL. I am not sure how to discover this other than by reading the source code or experimenting with names. The directive using MixedModels
evaluates the source file
joinpath(Pkg.dir(), "MixedModels", "src", "MixedModels.jl")
which, in turn, has several calls to include
include("RSC.jl") # regular sparse column-oriented matrices
include("utils.jl") # utilities to deal with the model formula
include("linearmixedmodel.jl")
include("LMM.jl")
The fact that this driver file has the directive
using DataFrames, Distributions
both before and after the directive
module MixedModels
means that these namespaces will be visible in the REPL (because of the first using
) and withing the module itself, because of the second using
.
You can always try evaluating a name in the namespace to see if it is visible in the REPL.
julia> whos(NLopt)
ForcedStop DataType
NLOPT_VERSION VersionNumber
NLopt Module
Opt DataType
algorithm Function
algorithm_name Function
default_initial_step! Function
equality_constraint! Function
force_stop Function
force_stop! Function
ftol_abs Function
ftol_abs! Function
ftol_rel Function
ftol_rel! Function
inequality_constraint! Function
initial_step Function
initial_step! Function
local_optimizer! Function
lower_bounds Function
lower_bounds! Function
max_objective! Function
maxeval Function
maxeval! Function
maxtime Function
maxtime! Function
min_objective! Function
optimize Function
optimize! Function
population Function
population! Function
remove_constraints! Function
stopval Function
stopval! Function
upper_bounds Function
upper_bounds! Function
vector_storage Function
vector_storage! Function
xtol_abs Function
xtol_abs! Function
xtol_rel Function
xtol_rel! Function
julia> algorithm
ERROR: algorithm not defined
The fully qualified name will be accessible
julia> NLopt.algorithm
# methods for generic function algorithm
algorithm(o::Opt) at /home/bates/.julia/NLopt/src/NLopt.jl:143
Martin asked about the equivalent of R's installed.packages()
which is
julia> Pkg.installed()
["GLM"=>"aefc6c1c616f784ddec920ea07ee6e3ab712bb50","Tk"=>v"0.0.1","MathProg"=>v"0.0.0","MAT"=>v"0.2.0","TextWrap"=>v"0.0.0","NLopt"=>v"0.0.0","Color"=>v"0.2.0","Mustache"=>v"0.0.0","Benchmark"=>v"0.0.0","Zlib"=>v"0.0.0","Debug"=>v"0.0.0","PyCall"=>v"0.0.0","Graphs"=>v"0.2.2","Gadfly"=>v"0.0.0","Compose"=>v"0.0.0","Metis"=>v"0.0.0","Cubature"=>v"0.1.0","CoinMP"=>v"0.0.1","GetC"=>v"0.0.0","Winston"=>v"0.0.0","RDatasets"=>"9000825fff7b059afc1973201d48e46521990a69","Profile"=>v"0.2.0","DataFrames"=>v"0.2.1","GZip"=>v"0.2.0","StrPack"=>v"0.0.0","IniFile"=>v"0.0.0","JSON"=>v"0.1.0","BinDeps"=>v"0.0.1","OpenGL"=>v"0.0.0","Rif"=>v"0.0.0","Iterators"=>v"0.0.0","Distributions"=>"1ba699146607c6c0a41e9b2c2266aee4c99e0141","HDF5"=>"8b7344bee440f3467adb2be6afe478677b25f224","Stats"=>v"0.2.0","Codecs"=>v"0.0.0","Clp"=>v"0.0.1","SemidefiniteProgramming"=>v"0.0.0","Clang"=>"6e995712fc89700190d7f79aed09e94e10752188","Options"=>v"0.2.0","ArgParse"=>v"0.2.1","Optim"=>"1dd33924ef113d960d8ffe981519ea78558f8681","DataStructures"=>v"0.2.3","Cairo"=>v"0.0.1","MathProgBase"=>v"0.0.0","Calculus"=>v"0.1.1"]
which is a hash or Dict
julia> typeof(Pkg.installed())
Dict{String,Union(String,VersionNumber)}
A more readable display is returned by
julia> Pkg.status()
ArgParse 0.2.1
Benchmark 0.0.0
BinDeps 0.0.1
Cairo 0.0.1
Clang master
Clp 0.0.1
Codecs 0.0.0
CoinMP 0.0.1
Color 0.2.0
Compose 0.0.0
Cubature 0.1.0
Debug 0.0.0
Distributions master
GLM master
GZip 0.2.0
Gadfly 0.0.0
GetC 0.0.0
Graphs master
HDF5 master
Iterators 0.0.0
JSON 0.1.0
MAT 0.2.0
MathProg 0.0.0
Metis 0.0.0
Mustache 0.0.0
NLopt 0.0.0
OpenGL 0.0.0
Optim master
Options 0.2.0
Profile master
PyCall 0.0.0
RDatasets master (dirty)
Rif 0.0.0
SemidefiniteProgramming 0.0.0
Stats 0.2.0
StrPack 0.0.0
TextWrap 0.0.0
Tk 0.0.1
Winston 0.0.0
DataFrames master (dirty)
DataStructures 0.2.3
Zlib 0.0.0
MathProgBase 0.0.0
Calculus master
IniFile 0.0.0
I will be adding an lmer()
function soon. Right now the package has two user-defined types for simple linear mixed models, LMMsimple
and LMMsplit
because I was experimenting to see if the complexity of handling the fixed-effects part separately from the random-effects part of the model was warranted (and apparently it is). The LMMsimple
class throws both into a single sparse matrix and updatemu()
for that class is only four lines.
## Update L, solve for ubeta and evaluate mu
function updatemu(m::LMMsimple)
m.A.nzval[:] = m.anzv[:] # restore A to ZXt*ZXt', update A and L
chm_factorize!(m.L, pluseye!(chm_scale!(m.A, m.lambda, 3), 1:m.Zt.q))
m.ubeta[:] = (m.L \ (m.lambda .* m.Zty))[:]
m.mu[:] = m.Zt'*(m.lambda .* m.ubeta)
end
For LMMsplit
the random-effects part of the system matrix is stored as a sparse symmetric matrix and the fixed-effects part is a dense matrix.
The formula/data interface is slightly more verbose than in R. A formula must be enclosed in :( )
to retain it as an expression. Thus
julia> using RDatasets
julia> ds = data("lme4", "Dyestuff")
Written by version 2.15.3
Minimal R version: 2.3.0
WARNING: has(d,k) is deprecated, use haskey(d,k) instead.
in make_unique at /home/bates/.julia/DataFrames/src/utils.jl:47
30x2 DataFrame:
Batch Yield
[1,] "A" 1545.0
[2,] "A" 1440.0
[3,] "A" 1440.0
[4,] "A" 1520.0
[5,] "A" 1580.0
[6,] "B" 1540.0
[7,] "B" 1555.0
[8,] "B" 1490.0
[9,] "B" 1560.0
[10,] "B" 1495.0
[11,] "C" 1595.0
[12,] "C" 1550.0
[13,] "C" 1605.0
[14,] "C" 1510.0
[15,] "C" 1560.0
[16,] "D" 1445.0
[17,] "D" 1440.0
[18,] "D" 1595.0
[19,] "D" 1465.0
[20,] "D" 1545.0
[21,] "E" 1595.0
[22,] "E" 1630.0
[23,] "E" 1515.0
[24,] "E" 1635.0
[25,] "E" 1625.0
[26,] "F" 1520.0
[27,] "F" 1455.0
[28,] "F" 1450.0
[29,] "F" 1480.0
[30,] "F" 1445.0
julia> fm1 = LMMsplit(:(Yield ~ 1|Batch), ds)
Linear mixed model fit by maximum likelihood
logLik: -163.66352994061094, deviance: 327.3270598812219
Variance components: [1388.34, 2451.25]
Number of obs: 30; levels of grouping factors: [6]
Fixed-effects parameters: [1527.5]
The first time you do this the response will seem slow, which is because all the methods are being encountered for the first time and need to be compiled. Subsequent calls to will be much faster.
That was really very useful. whos() is now a "must know" for me.
Interesting that my julia version of April 26, was "way too old" and e.g. couldn't get data(.,.) to work... Using the Ubuntu PPAs instead indeed, was sufficient.... and much faster than the 'git pull ; make' ... aah it fails re-building 'make realclean; make' ... dance which would usually take many hours..
Thank you, Doug!
In Julia multiple values are returned from a function as a tuple. You can write
return (a, b)
and extract the values a and b with indexes. A common idiom is to assign the values of a returned function to a tuple of identifiers
n, p = size(X)
On Wed, Apr 20, 2016 at 12:13 AM Daijiang Li [email protected] wrote:
How do you return multiple named values so you can extract part of them later? In R, we can use return(list(a = a, b = b)). Thanks!
— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/dmbates/MixedModels.jl/issues/1#issuecomment-212259714
Thanks Prof. Bates!
I ended with return a Dict so I can extract values by name later.
I am converting Tony Ives' R code to do phylogenetic mixed models (mixed model with specific phylogenetic var-cov matrix) into Julia. As the R code is too slow and it is unlikely to be included into the lme4
package.
I am new to Julia, even though I attended your Julia workshop two years ago... The code is here. Even now, it is >10 times faster than R. But I wonder can you have a look at them and let me know how to improve/profile it?
Is it possible to let your mixed model to do such kind of analyses?
Thank you for your help! Daijiang
First I would suggest creating a package. Secondly don't pass around a huge number of arguments. If they are related to the same model then create a type for the model.
On Thu, Apr 21, 2016 at 10:31 AM Daijiang Li [email protected] wrote:
Thanks Prof. Bates! I ended with return a Dict so I can extract values by name later. I am converting Tony Ives' R code to do phylogenetic mixed models (mixed model with specific phylogenetic var-cov matrix) into Julia. As the R code is too slow and it is unlikely to be included into the lme4 package.
I am new to Julia, even though I attended your Julia workshop two years ago... The code is here https://github.com/daijiang/pglmm.jl. Even now, it is >10 times faster than R. But I wonder can you have a look at them and let me know how to improve/profile it?
Is it possible to let your mixed model to do such kind of analyses?
Thank you for your help!
Daijiang
— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/dmbates/MixedModels.jl/issues/1#issuecomment-212975338