GLM.jl
GLM.jl copied to clipboard
linear model with Float32
Hi GLM,
I am trying to use the lm function with the following command:
mydata = DataFrame(X=MyDataType.(1:3), Y=MyDataType.(11:13)); lm(@formula(Y ~ 1 + X), mydata)
This works fine with MyDataType = Float64 or Int32 and Int64 but with Float32 I get the following error:
ERROR: MethodError: no method matching delbeta!(::DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}, ::Array{Float32,1}) Closest candidates are: delbeta!(::DensePredQR{T<:Union{Float32, Float64}}, ::Array{T<:Union{Float32, Float64},1}) where T<:Union{Float32, Float64} at /home/tooler/.julia/packages/GLM/FxTmX/src/linpred.jl:76 delbeta!(::DensePredChol{T<:Union{Float32, Float64},#s14} where #s14<:LinearAlgebra.Cholesky, ::Array{T<:Union{Float32, Float64},1}) where T<:Union{Float32, Float64} at /home/tooler/.julia/packages/GLM/FxTmX/src/linpred.jl:130 delbeta!(::DensePredChol{T<:Union{Float32, Float64},#s14} where #s14<:LinearAlgebra.CholeskyPivoted, ::Array{T<:Union{Float32, Float64},1}) where T<:Union{Float32, Float64} at /home/tooler/.julia/packages/GLM/FxTmX/src/linpred.jl:135
My environment is the following:
julia version 1.0.1 [38e38edf] GLM v1.0.1
The code probably needs some tweaks to be more generic. PR welcome.
@kleinschmidt Is it on purpose that the model matrix promotes to Float64 even though the data columns are Float32?
It's worth noticing that lm(Matrix{Float32},Vector{Float32}) actually works so it's not Float32 that is the problem per se. The problem is the mix of the design matrix being Float64 and the response being Float32 that gives the error.
I don't know actually...I suspect it was just a sensible default. I think it could be overridden by specifying ModelMatrix{Matrix{Float32}} in fit. Or by doing some promoteing.
I'm not super inclined to do the deep dive in the current StatsModels code to fix this though given that (inshallah) that's going to be replaced by Terms 2.0 (JuliaStats/StatsModels.jl#71).
It's also hard to handle this in a sufficiently general way since you might have a categorical variable that gets expanded according to a contrasts matrix; that defaults to Float64, so you'd somehow have to figure out to un-promote that to Float32 in the case when the only continuous variables are Float32s.
Also to clarify what @andreasnoack said: this is an issue with StatsModels (which handles the formula), not with GLM.
I just encountered the same issue. This problem occurs when reading data from Stata's .dta files using StatFiles.jl. Stata seems to work with lower precision.
Glad the other lm function with no formula works with Float32. I would not put the function with formula on doc if it doesn' t work with such a basic data type. Is it a good time to fix this?
I guess the question is, what should GLM do when it gets a y and X of heterogenous eltypes? the current situation with (IIRC) a method error is not great, and all the fixes I can think of for StatsModels will not completely solve this problem (it would still be possible to get heterogenous eltypes out, and https://github.com/JuliaStats/StatsModels.jl/pull/294 by making promotion less aggressive by default may in fact make the problem worse here if, for instance, users are doing something like y ~ 1 + x where x is Int64 or something...)
I'd suggest something like a promotion (or at least an eltype check) in https://github.com/JuliaStats/GLM.jl/blob/468d0f84bc98ad09f9784d7061c21d74e30ad0dc/src/lm.jl#L129 to ensure that teh eltype is consistent, if that's really required for GLM to work.
Ahahaha yes here's exactly what I was talking about...statsmodels doctests picked up this (lack of Bool-to-Float64 conversion): https://github.com/JuliaStats/StatsModels.jl/actions/runs/5095234337/jobs/9159967200?pr=294#step:5:305