MixedModels.jl icon indicating copy to clipboard operation
MixedModels.jl copied to clipboard

Profile the objective

Open dmbates opened this issue 3 years ago • 3 comments

  • At present there are two functions profileβ and profilelogσ for profiling w.r.t. the fixed-effects coefficients and w.r.t. the logarithm of the per-observation noise standard deviation.
  • Both produce a table with values of ζ (the signed square root of the change in objective from the optimum) and β and θ as StaticArrays.SVectors. The return value from profileβ also has some names of coefficients and the parmap to allow for θ to be transformed to standard deviations and correlations of random effects.
  • Forward and reverse interpolation splines are added for parameters to and from ζ

Did behavior change? Did you add need features? If so, please update NEWS.md

  • [ ] add entry in NEWS.md
  • [ ] after opening this PR, add a reference and run docs/NEWS-update.jl to update the cross-references.

Should we release your changes right away? If so, bump the version:

  • [ ] I've bumped the version appropriately

dmbates avatar Sep 21 '22 21:09 dmbates

I need to rewrite the destructuring syntax in the old, tedious syntax if we are to continue to test on the LTS version.

dmbates avatar Sep 21 '22 21:09 dmbates

Codecov Report

Patch coverage: 94.04% and project coverage change: -0.26 :warning:

Comparison is base (d946f6f) 95.90% compared to head (8977dc1) 95.64%.

:exclamation: Current head 8977dc1 differs from pull request most recent head f8f4559. Consider uploading reports for the commit f8f4559 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #639      +/-   ##
==========================================
- Coverage   95.90%   95.64%   -0.26%     
==========================================
  Files          29       35       +6     
  Lines        2732     3189     +457     
==========================================
+ Hits         2620     3050     +430     
- Misses        112      139      +27     
Impacted Files Coverage Δ
src/MixedModels.jl 100.00% <ø> (ø)
src/utilities.jl 96.92% <ø> (ø)
src/linearmixedmodel.jl 94.88% <28.57%> (-3.23%) :arrow_down:
src/profile/utilities.jl 94.31% <94.31%> (ø)
src/profile/thetapr.jl 97.14% <97.14%> (ø)
src/remat.jl 96.69% <97.22%> (-0.04%) :arrow_down:
src/bootstrap.jl 93.93% <100.00%> (+2.15%) :arrow_up:
src/profile/fixefpr.jl 100.00% <100.00%> (ø)
src/profile/profile.jl 100.00% <100.00%> (ø)
src/profile/sigmapr.jl 100.00% <100.00%> (ø)
... and 1 more

... and 1 file with indirect coverage changes

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

codecov[bot] avatar Sep 21 '22 21:09 codecov[bot]

I need to rewrite the destructuring syntax in the old, tedious syntax if we are to continue to test on the LTS version.

@dmbates add a Compat.jl dependency and use @compat -- I like the new syntax, but I would still like to support LTS as long a possible.

palday avatar Sep 21 '22 22:09 palday

BSplineKit v0.12.0 requires julia v1.8 To be able to test on Julia LTS I allowed v0.11 of BSplineKit. This means that after contour interpolation is available, the tests will need to be restricted to julia v1.8 or later. Contour interpolation uses periodic splines, which became available in BSplineKit v0.12

dmbates avatar Oct 18 '22 16:10 dmbates

I would prefer to keep LTS compat ... I can see how hard it would be to get BSplineKit to support 1.6 (and whether the author is amenable to that....)

palday avatar Oct 18 '22 17:10 palday

If you have time to look at that, sure. I expect that the periodic splines, which were introduced in BSplineKit v0.12.0, will be needed in MixedModelsMakie but probably not in MixedModels.

The way I imagine the division of responsibility is that MixedModels will add the forward and backward splines to the MixedModelProfile, because they are used to obtain profile-based confidence intervals. These splines are natural cubic splines. The reverse splines, mapping the zeta values to the fixed-effects coefficients, are used to create these profile-based confidence intervals. The periodic splines are used for contour interpolation, which should be in MixedModelsMakie, I think.

That is, we can probably live with a compat entry in MixedModels.jl allowing 0.11,0.12 for BSplineKit.

dmbates avatar Oct 18 '22 17:10 dmbates

@dmbates Okay, I won't be making a compat fix. The problem is a missing method strides(::SparseMatrixCSC), which comes up in an ldiv operation. (https://github.com/JuliaLang/julia/pull/44027 added an appropriate method). Then we'll do your proposed compat fix and when it comes time to plotting the contours in MixedModelsMakie, we'll do a @static if VERSION and give an informative error for Julia < 1.8.

palday avatar Oct 18 '22 18:10 palday

Sounds good. Thanks for looking into that.

dmbates avatar Oct 18 '22 18:10 dmbates

@palday My plan is to make the show method for a MixedModelProfile just show the tbl field.

I like the way the Table is set up now. I still need to work on profiling the theta parameters or the vc's and cp's as I guess we will take to calling them. One way to profile the theta parameters is to change the definition of the objective function in the optimization to fold the value of the parameter being profiled back in to the theta vector. For the time being I think I will write a new function to do the conditional optimization and later we can see if we want to modify the one in linearmixedmodels.jl to allow for theta with one of its components fixed.

dmbates avatar Oct 25 '22 20:10 dmbates

@palday Do you have any insight into the test failures? As far as I can see I haven't modified anything that should cause that test to fail? It may be a Heisenbug if you look at the index range reported in the error. I'm not altogether sure that the formula makes sense for the GLM used to initialize the estimates of the fixed-effects. The GLM is fit during the construction of the GeneralizedLinearMixedModel.

My comment about the formula and the GLM is because this formula has no fixed effects so the GLM will have no coefficients.

dmbates avatar Oct 25 '22 20:10 dmbates

I've restarted CI to see if it was a one-off.

For posteriority, the test failure was:

previous formula misspecification: Error During Test at /home/runner/work/MixedModels.jl/MixedModels.jl/test/misc.jl:14
  Test threw exception
  Expression: MixedModel(#= /home/runner/work/MixedModels.jl/MixedModels.jl/test/misc.jl:14 =# @formula(yield ~ 0 + (1 | batch)), dyestuff, Poisson()) isa GeneralizedLinearMixedModel
  BoundsError: attempt to access 0×0 Matrix{Float64} at index [1:140628461236480, 1:140628461236480]

If it turns out to be legitimate failure, I have a few more ideas.

palday avatar Oct 25 '22 20:10 palday

@dmbates I bet this is from one of the recent changes in GLM.jl .... I have an idea how to fix and will ping you soon 🙂

palday avatar Oct 25 '22 20:10 palday

@dmbates I bet this is from one of the recent changes in GLM.jl .... I have an idea how to fix and will ping you soon slightly_smiling_face

@palday That was my guess too. Has GLM now changed to using the QR decomposition rather than the Cholesky?

dmbates avatar Oct 25 '22 20:10 dmbates

@dmbates recent changes in GLM that might impact this:

  • better tolerance of some floating point weirdness in binomial GLMs (1.8.1, most recent)
  • change in definition of how some quantities are computed for null models, including models without an intercept

Wait, I just found a better suspect, which snuck into the 1.8.1 patch release even though it should have been it's own 1.9 release

Implementation of dropcollinear feature in GeneralizedLinearModel (#488) (@mousum-github)

For my own reference, it looks like there are some BLAS differences between the CI runners and BLAS differences + the above things are often enough to create Heisenbugs around some edge cases.

Failed nightly:

Julia Version 1.9.0-DEV.1657
Commit 35d12890aba (2022-10-25 07:47 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 2 × Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, haswell)
  Threads: 1 on 2 virtual cores

Failed LTS:

Julia Version 1.6.7
Commit 3b76b25b64 (2022-07-19 15:11 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake-avx512)

Passed CI

Julia Version 1.8.2
Commit 36034abf260 (2022-09-29 15:21 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 2 × Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, icelake-server)

Failed CI

Julia Version 1.8.2
Commit 36034abf260 (2022-09-29 15:21 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 2 × Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, skylake-avx512)
  Threads: 1 on 2 virtual cores

palday avatar Oct 25 '22 21:10 palday

@palday I am leaning towards profiling the standard deviations of the random effects separately from the correlation parameters. I think your introduction of the capability to set a value of sigma in the optsum will allow us to evaluate the profiled objective with respect to the standard deviation of a random effect without a lot of new code.

We just redefine the objective function as

    function obj(x, g)
        isempty(g) || throw(ArgumentError("g must be empty"))
        updateL!(setθ!(m, x))
        m.optsum.sigma = vc[] / norm(view(m.reterms[i].λ.data, j, :))
        return objective(m)
    end

for the value of the standard deviation in vc[] (vc has to be a Ref in this formulation) of the jth component of the random effects associated with the ith grouping factor in the model m.

This may not be the most effective way of doing the conditional optimization but it is very low impact in terms of writing new code.

I'll work on fleshing this idea out a bit more when I return from my walk.

dmbates avatar Oct 26 '22 19:10 dmbates

@palday I have been thinking about the fields of the MixedModelProfile type. I feel it would be best to keep a shallow copy of the LinearMixedModel object itself in the profile object. That is, store the read-only fields that define the model and allocate copies of the fields in optsum and in reterms that are updated by setθ!. Then we won't need to store the various coefficient names and grouping factor names and the parmap in the MixedModelProfile.

One advantage of this is that after fitting a model and profiling the fit we only need to store the profile because we can extract the model if we want to see or use it.

Although L does get modified by updateL! I don't think we should copy that whole field, which can be very large. We just need to enforce the convention that evaluating any properties of the model should only be done after updateL! unless you are sure that the contents of L are up-to-date.

Does this seem like a reasonable approach?

dmbates avatar Nov 02 '22 14:11 dmbates

Although L does get modified by updateL! I don't think we should copy that whole field, which can be very large. We just need to enforce the convention that evaluating any properties of the model should only be done after updateL! unless you are sure that the contents of L are up-to-date.

I think this is okay if we are very clear in documentation about the shared reference and all the implications:

  • not thread safe
  • can leave a different object (the original model) in an inconsistent state if interrupted

palday avatar Nov 02 '22 15:11 palday

@palday There are still some glitches in the examples, mostly due to scaling issues when choosing an initial step for the components of θ. That will need more thought and better code.

I am working through the examples and trying out the zetaplot in the profile branch of MixedModelsMakie. That plot needs decluttering.

dmbates avatar Nov 16 '22 18:11 dmbates

There are many changes in commit 3cc7b53 that I would appreciate experiences with and comments upon @palday @kliegl. I have switched to storing Tuples in most of the columns of the table and storing the values of the standard deviations of the random effects as part of the "sigma" column. It turns out that for a table like this there isn't much of an advantage in using StaticArrays.SVectors rather than tuples. I have some code to start profiling the random-effects standard deviations. The problem is deciding how large a first step to take and I have a few ideas about that.

The confint method produces confidence intervals on all of the parameters that have been profiled. Corner cases still need to be considered. Nevertheless I am encouraged by this progress.

dmbates avatar Nov 20 '22 19:11 dmbates

@dmbates Exciting! Will take a look tomorrow.

kliegl avatar Nov 20 '22 20:11 kliegl

@dmbates Using the MixedModels#profile branch, where do I find the final two plot functions?

using CairoMakie, MixedModels, MixedModelsMakie
CairoMakie.activate!(; type="svg")
# works
m1 = fit(MixedModel, @formula(yield ~ 1 + (1|batch)), MixedModels.dataset(:dyestuff))
pr01 = profile(m1);
confint(pr01)    # default is 95% coverage
# where do I find the following functions?
zetaplot(pr01)  # default intervals are at 50%, 80%, 90%, 95% and 99% coverage
profiledensity(pr01)  # the density functions that would produce these profiles

kliegl avatar Nov 20 '22 21:11 kliegl

@kliegl you also need MixedModelsMakie#profile for plotting.

palday avatar Nov 21 '22 01:11 palday

Works! Thanks. I am set to try it out on some of my current work. Great.

kliegl avatar Nov 21 '22 01:11 kliegl

@palday @kliegl I have a Quarto document where I am developing the interpolated contour plots. I find it easier to write scripts first then figure out how I should structure the functions. Anyway, it is working. If you want a preview edition I can send it to you however you want to receive it. I probably won't make a repository because it is not really permanent but I can send you the .qmd file then all you need to do is create a quarto project directory, start Julia and add the necessary packages then render the document.

dmbates avatar Nov 22 '22 20:11 dmbates

@dmbates I would like the .qmd file -- email is fine.

kliegl avatar Nov 22 '22 20:11 kliegl

I forgot, it's Quarto so there is a built-in publishing mechanism. See https://dmbates.quarto.pub/plotting-profiles/

dmbates avatar Nov 22 '22 20:11 dmbates

Damn, trying to profile the standard deviations of the random effects is taking its toll on me. I have been working for weeks on it and every time I think I have a good way of doing it, one or more of our examples blows up in my face.

I am going to commit some code that works for several examples but fails on rather simple examples like the dyestuff data. This is just to avoid my cursing the universe if my computer fails and I lose all my work for the last couple of months.

dmbates avatar Jan 27 '23 17:01 dmbates

On further examination it seems that the problems start when a value of zero for the theta parameter is passed to the objective function within MixedModels.profilevc, which returns Inf, and things go downhill from there. Will try to find a workaround.

dmbates avatar Jan 27 '23 19:01 dmbates

This PR is finally ready for others to take it out for a spin.

It adds a profile function that can be applied to a LinearMixedModel. At present it profiles all the parameters except for the correlation parameters, returning a MixedModelProfile object. This object contains the original model (hopefully with any changes made to values during the profiling process restored - but this would be a good thing to check), a row-table called tbl which contains the parameter values and zeta values from the profiling and two Dicts of natural, cubic interpolation splines. The fwd Dict contains an interpolation spline of zeta versus the parameter being profiled, and the rev dict is the opposite direction. These splines are created using the BSplineKit interpolate function. Some of these splines wiggle a bit too much for my taste so I plan to look at the code in that package a bit more carefully.

The parameters should be subdivided into beta's, the fixed-effects parameters, theta's, elements of the relative covariance factor, and sigma's, the variance components - but on the standard deviation scale. For now, confint applied to the profile produces confidence intervals on all of these parameters. We should consider allowing for confidence intervals on only the beta's or the beta's and sigma's. The theta's should probably be kept distinct from the other parameters. They are used for optimization and we may want to investigate their contours, etc., but they are indirect and not of that much interest for inference by themselves.

At this point the rho parameters are evaluated for the other profiles but are not profiled themselves. They are tricky - even trickier than the variance component parameters. We will need to decide if the effort to write code to profile them is warranted.

There is a profile branch of the MixedModelsMakie package that provides zetaplot to plot the forward splines. Again, this should probably have an argument to choose just beta's or just beta's and sigma's for the plot as those will be the parameters of greatest interest. I'll be working on that code next.

dmbates avatar Jan 28 '23 18:01 dmbates

Could one make an argument that profiling is "advantageous" for thetas, but bootstrapping is "good enough" for CPs?

kliegl avatar Jan 28 '23 18:01 kliegl