GLM.jl icon indicating copy to clipboard operation
GLM.jl copied to clipboard

linear model with Float32

Open Ankur-deDev opened this issue 7 years ago • 11 comments

Hi GLM,

I am trying to use the lm function with the following command: mydata = DataFrame(X=MyDataType.(1:3), Y=MyDataType.(11:13)); lm(@formula(Y ~ 1 + X), mydata)

This works fine with MyDataType = Float64 or Int32 and Int64 but with Float32 I get the following error:

ERROR: MethodError: no method matching delbeta!(::DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}, ::Array{Float32,1}) Closest candidates are: delbeta!(::DensePredQR{T<:Union{Float32, Float64}}, ::Array{T<:Union{Float32, Float64},1}) where T<:Union{Float32, Float64} at /home/tooler/.julia/packages/GLM/FxTmX/src/linpred.jl:76 delbeta!(::DensePredChol{T<:Union{Float32, Float64},#s14} where #s14<:LinearAlgebra.Cholesky, ::Array{T<:Union{Float32, Float64},1}) where T<:Union{Float32, Float64} at /home/tooler/.julia/packages/GLM/FxTmX/src/linpred.jl:130 delbeta!(::DensePredChol{T<:Union{Float32, Float64},#s14} where #s14<:LinearAlgebra.CholeskyPivoted, ::Array{T<:Union{Float32, Float64},1}) where T<:Union{Float32, Float64} at /home/tooler/.julia/packages/GLM/FxTmX/src/linpred.jl:135

My environment is the following:

julia version 1.0.1 [38e38edf] GLM v1.0.1

Ankur-deDev avatar Oct 18 '18 00:10 Ankur-deDev

The code probably needs some tweaks to be more generic. PR welcome.

nalimilan avatar Oct 18 '18 15:10 nalimilan

@kleinschmidt Is it on purpose that the model matrix promotes to Float64 even though the data columns are Float32?

It's worth noticing that lm(Matrix{Float32},Vector{Float32}) actually works so it's not Float32 that is the problem per se. The problem is the mix of the design matrix being Float64 and the response being Float32 that gives the error.

andreasnoack avatar Oct 18 '18 20:10 andreasnoack

I don't know actually...I suspect it was just a sensible default. I think it could be overridden by specifying ModelMatrix{Matrix{Float32}} in fit. Or by doing some promoteing.

I'm not super inclined to do the deep dive in the current StatsModels code to fix this though given that (inshallah) that's going to be replaced by Terms 2.0 (JuliaStats/StatsModels.jl#71).

kleinschmidt avatar Oct 18 '18 21:10 kleinschmidt

It's also hard to handle this in a sufficiently general way since you might have a categorical variable that gets expanded according to a contrasts matrix; that defaults to Float64, so you'd somehow have to figure out to un-promote that to Float32 in the case when the only continuous variables are Float32s.

kleinschmidt avatar Oct 18 '18 21:10 kleinschmidt

Also to clarify what @andreasnoack said: this is an issue with StatsModels (which handles the formula), not with GLM.

kleinschmidt avatar Oct 19 '18 15:10 kleinschmidt

I just encountered the same issue. This problem occurs when reading data from Stata's .dta files using StatFiles.jl. Stata seems to work with lower precision.

greimel avatar May 02 '19 11:05 greimel

Glad the other lm function with no formula works with Float32. I would not put the function with formula on doc if it doesn' t work with such a basic data type. Is it a good time to fix this?

xgdgsc avatar May 25 '23 09:05 xgdgsc

I guess the question is, what should GLM do when it gets a y and X of heterogenous eltypes? the current situation with (IIRC) a method error is not great, and all the fixes I can think of for StatsModels will not completely solve this problem (it would still be possible to get heterogenous eltypes out, and https://github.com/JuliaStats/StatsModels.jl/pull/294 by making promotion less aggressive by default may in fact make the problem worse here if, for instance, users are doing something like y ~ 1 + x where x is Int64 or something...)

I'd suggest something like a promotion (or at least an eltype check) in https://github.com/JuliaStats/GLM.jl/blob/468d0f84bc98ad09f9784d7061c21d74e30ad0dc/src/lm.jl#L129 to ensure that teh eltype is consistent, if that's really required for GLM to work.

kleinschmidt avatar May 26 '23 21:05 kleinschmidt

Ahahaha yes here's exactly what I was talking about...statsmodels doctests picked up this (lack of Bool-to-Float64 conversion): https://github.com/JuliaStats/StatsModels.jl/actions/runs/5095234337/jobs/9159967200?pr=294#step:5:305

kleinschmidt avatar May 26 '23 22:05 kleinschmidt