GLM.jl icon indicating copy to clipboard operation
GLM.jl copied to clipboard

Should GlmResp be a Table?

Open dmbates opened this issue 6 years ago • 6 comments

A GlmResp contains several parallel vectors, and a distribution D. Most of the functions that apply to these objects iterate over these vectors in parallel. Would it make sense to store these vectors in a Table in the sense of the https://github.com/JuliaData/Tables.jl package? The update operations are like iterating over a rowtable.

dmbates avatar Jun 20 '19 18:06 dmbates

I think Tables.jl might be more general than needed for GlmResp. I like https://github.com/piever/StructArrays.jl which is conceptually a bit simpler and would give us row iteration.

andreasnoack avatar Jun 20 '19 18:06 andreasnoack

That sounds good to me.

quinnj avatar Jun 20 '19 18:06 quinnj

@dmbates I think we need some change here both for GlmResp and LmResp. The reason is that if your y is a view or e.g. Arrow.Primitive fitting will not work as you are unable to ensure that all V<:FPVector have the same type. I am not sure how to best fix it as I do not know internals of GLM.jl well enough, but for sure the current specification is too restrictive.

bkamins avatar Jun 06 '21 06:06 bkamins

GlmResp and LmResp were formulated a long time ago and a redesign may be warranted. There are several competing goals - generality, ease-of-use, efficiency for the most common cases - that would need to be balanced. The original design was based on the glm function in R which also was formulated early on in the history of the language and based on an even earlier implementation in S. It would be worthwhile discussing the overall structure and how these different objectives should be balanced. I don't see a Discussions area in this repository. Is it possible to add it? I looked but didn't see an obvious way to do so.

dmbates avatar Jun 06 '21 16:06 dmbates

I think it is OK just to discuss it in this issue.

bkamins avatar Jun 06 '21 16:06 bkamins

Having done some more benchmarking it looks like I am trying to solve a non-problem here. I keep convincing myself that the bottleneck in fitting a glm must be the call to updateμ! because it has to evaluate so many intermediates but it turns out that is not the case.

What I will do instead is update the perf/glm.jl file to modern Julia so that others can run benchmarks and profiles.

dmbates avatar Jun 06 '21 19:06 dmbates