StatsModels.jl icon indicating copy to clipboard operation
StatsModels.jl copied to clipboard

Why ContrastsMatrix matrix is Matrix{Float64}?

Open PharmCat opened this issue 3 years ago • 7 comments

Why matrix field of struct ContrastsMatrix is Matrix{Float64}? For many cases fo DummyCoding() or FullDummyCoding() this can be BitMatrix or SparseMatrixCSC{Bool, Int64}. For big datasets I try to make something like this:

mutable struct OwnDummyCoding <: AbstractContrasts
# Dummy contrasts 
end
function StatsModels.contrasts_matrix(C::OwnDummyCoding, baseind, n)
    sparse(I, n, n)[:, [1:(baseind-1); (baseind+1):n]]
end

But I have memory overflow because ContrastsMatrix tries to convert this to Matrix{Float64}.

PharmCat avatar Jan 03 '22 21:01 PharmCat

Is it possible to make:

struct ContrastsMatrix{C <: AbstractContrasts, T, U, M}
    matrix::M
    termnames::Vector{U}
    levels::Vector{T}
    contrasts::C
    invindex::Dict{T,Int}
    function ContrastsMatrix(matrix::M,
                             termnames::Vector{U},
                             levels::Vector{T},
                             contrasts::C) where {U,T,C <: AbstractContrasts} where M <: AbstractMatrix
        allunique(levels) || throw(ArgumentError("levels must be all unique, got $(levels)"))
        invindex = Dict{T,Int}(x=>i for (i,x) in enumerate(levels))
        new{C,T,U,M}(matrix, termnames, levels, contrasts, invindex)
    end
end

PharmCat avatar Jan 03 '22 23:01 PharmCat

@PharmCat how many contrast levels do you have? If this is for the grouping variable in MixedModels.jl, then there is the Grouping() pseudocontrast which avoids creating an actual matrix

palday avatar May 19 '22 03:05 palday

@PharmCat how many contrast levels do you have? If this is for the grouping variable in MixedModels.jl, then there is the Grouping() pseudocontrast which avoids creating an actual matrix

@palday

Hello! It can be more than 10^5. Actually I'am working on Metida.jl, that helps me in some tasks where MixedModels.jl can't be used. I know that in MixedModels this problem solved, Metida have some "workaround" too. And I see 'Grouping' in MixedModels.jl and may be 'Grouping' code should be moved to StatsModels.jl and documented there (may be with some other code from MixedModels, such using "/" in terms). Also I don't know why ContrastsMatrix matrix field set as Matrix{Float64}, why in can't be more flexible.

So also I can't find any roadmap for StatsModels, I think StatsModels is a core package for JuliaStats ecosystem, but have no information about it's development plan to version 1.0

PharmCat avatar May 20 '22 14:05 PharmCat

The nesting syntax / is implemented in RegressionFormulae.jl

palday avatar May 20 '22 14:05 palday

The implementation of Grouping() is quite simple: https://github.com/JuliaStats/MixedModels.jl/blob/621f88b1f594ea0827d9ac7e8628113dd2121bef/src/grouping.jl#L2-L34

Depending on the exact structure of your model, you might be able to skip using the full formula infrastructure and instead call a custom modelcols method directly -- this is how random effects and associated sparse matrices are constructed in MixedModels.

palday avatar May 20 '22 15:05 palday

The implementation of Grouping() is quite simple:

Yep, but this means that I should copy this code or include MixedModels as a dependency. Maybe place this functionality in StatsModels?

PharmCat avatar May 20 '22 17:05 PharmCat

There's nothing wrong with copying this code, but maybe @kleinschmidt has thoughts on whether it makes more general sense to include this in StatsModels?

palday avatar Jun 28 '22 03:06 palday