[wIP] Allow TriangularRFP for diagonal blocks of L [ci skip]
- Allow for blocks on the diagonal of L to be
TriangularRFPfrom https://github.com/JuliaLinearAlgebra/RectangularFullPacked.jl - In the process of making these changes I found that I needed to be more explicit about when a diagonal block is being treated as
Hermitianand stored in the lower triangle (copyscaleinflate!andrankUpdate!) and when it is being lower triangular (aftercholUnblocked!and in therdiv!call). - I had previously been playing somewhat fast and loose with these distinctions and just passing around
Matrixobjects for the dense case but that can trip you up. - The timings shown below indicate that there is a considerable speed penalty for
TriangularRFP. This is more of a slowdown in therankUpdate!method, which involves many calls tosetindex!, rather than thecholUnblocked!method. - It may be possible to speed up some of the calls to
setindex!because of the way the loops in therankUpdate!work. - There is a new named argument,
RFPthreshold, in theLinearMixedModelconstructor (but not yet inMixedModel). - Right now this has a
[ci skip]designation because I haven't updated all the tests to the more stringent requirements onHermitianandTriangulartypes. - Tests should now pass (but apparently there are failures with some configurations).
- Fixes #813
Some timings on my laptop
(jl_6i3bqi) pkg> st
Status `/private/var/folders/d9/t94kd5m50rl1r3bchncn62sm0000gn/T/jl_6i3bqi/Project.toml`
[ff71e718] MixedModels v4.34.0
julia> dat = MixedModels.dataset(:insteval)
Arrow.Table with 73421 rows, 7 columns, and schema:
:s String
:d String
:dept String
:studage String
:lectage String
:service String
:y Int8
julia> f = @formula(y ~ 1 + service + (1|s) + (1|d) + (1|dept));
julia> m1 = fit(MixedModel, f, dat; progress=false);
julia> @be objective(updateL!($m1))
Benchmark: 20 samples with 1 evaluation
min 5.143 ms (38 allocs: 1.094 KiB)
median 5.172 ms (38 allocs: 1.094 KiB)
mean 5.176 ms (38 allocs: 1.094 KiB)
max 5.270 ms (38 allocs: 1.094 KiB)
# using the db/RFP branch
julia> m1 = fit(MixedModel, f, dat; progress=false);
julia> BlockDescription(m1)
rows: s d dept fixed
2972: Diagonal
1128: Sparse Diag/TrRFP
14: Dense Sparse/Dense Diag/Dense
3: Dense Dense Dense Dense
julia> @be objective(updateL!($m1))
Benchmark: 4 samples with 1 evaluation
26.432 ms (31 allocs: 768 bytes)
26.525 ms (31 allocs: 768 bytes)
26.669 ms (31 allocs: 768 bytes)
26.937 ms (31 allocs: 768 bytes)
julia> m2 = LinearMixedModel(f, dat; RFPthreshold=2000);
julia> BlockDescription(m2)
rows: s d dept fixed
2972: Diagonal
1128: Sparse Diag/Dense
14: Dense Sparse/Dense Diag/Dense
3: Dense Dense Dense Dense
julia> @be objective(updateL!($m2))
Benchmark: 20 samples with 1 evaluation
min 5.144 ms (28 allocs: 656 bytes)
median 5.173 ms (28 allocs: 656 bytes)
mean 5.173 ms (28 allocs: 656 bytes)
max 5.256 ms (28 allocs: 656 bytes)
There may be a way of cutting down on the cost of the rankUpdate! operation that evaluates L[2,2] - L[2,1] * L[2,1] , which accounts for the largest amount of time spend in updateL! when TriangularRFP is used.
The bottleneck is not evaluating each update; it is writing it back to the TriangularRFP matrix with setindex!. Because we are storing the lower triangle in the RFP format, we could use linear indexing to the underlying matrix for the first half of the columns of L[2,2]. The [i,j] entry in L[2,2] is exactly the [i,j] entry of L[2,2].parent, if the size of L[2,2] is odd, or the [i + 1, j] entry of L[2,2].parent, if the size is even.
The position of the elements in the lower right triangle of L[2,2] is more complicated to determine but we could either code that up or just fall back on the existing setindex! method. Furthermore, we could sort the rows of L[2,1] and L[2,2] by descending order of A[2,2].diag so that more of the updates are in the left half of L[2,2].
Does this seem worth trying, @palday @ajinkya-k?
I am willing to work on it and feel that it could be done relatively quickly but experience suggests that my "relatively quickly" may not be as quick as I imagine.
seems worth trying for sure.
I agree it's worth trying!
~Isn’t there an SYRK for rectangular full packed format?~
Sorry I totally misunderstood the code (https://github.com/JuliaLinearAlgebra/RectangularFullPacked.jl/blob/b159bfabd981c46b8842ff1ad857aa95f6b52cad/src/hermitian.jl#L43)
The setindex slowness is only for the case where the argument A is a SparseMatrixCSC correct?
@ajinkya-k I should have been more clear that I was speaking of the case where L[2,1] is sparse. Using RFP for L[2,2] only makes sense if L[2,2] is very large and, by design, L[2,1] is even larger (because L[1,1] is as large as possible). If L[2,1] needs to be stored as a dense matrix you have already got a much bigger problem to contend with.
No I think that was clear but I twisted myself into knots trying to read the file changes from the PR
Codecov Report
Attention: Patch coverage is 84.76821% with 23 lines in your changes missing coverage. Please review.
Project coverage is 96.73%. Comparing base (
35747ac) to head (09ad439).
| Files with missing lines | Patch % | Lines |
|---|---|---|
| src/remat.jl | 54.54% | 20 Missing :warning: |
| src/linalg/rankUpdate.jl | 91.66% | 2 Missing :warning: |
| src/pca.jl | 83.33% | 1 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #821 +/- ##
==========================================
- Coverage 97.33% 96.73% -0.60%
==========================================
Files 36 36
Lines 3495 3588 +93
==========================================
+ Hits 3402 3471 +69
- Misses 93 117 +24
| Flag | Coverage Δ | |
|---|---|---|
| current | 96.37% <84.10%> (-0.62%) |
:arrow_down: |
| minimum | 96.73% <84.76%> (-0.60%) |
:arrow_down: |
| nightly | 96.33% <84.66%> (-0.60%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
@dmbates can you add a "fixes #813" to the PR description so that the issue is automatically closed when this is merged?
So I have a question about strategy. The bottleneck when using TriangularRFP for (usually) the [2,2] block of L which implies that the [2,1] block will be a SparseMatrixCSC is the rankUpdate! operation. This is slow because every assignment like C[i,j] += A[j, k] * A[i, k] has to go through a complex calculation of mapping the virtual indices [i,j] to the Cartesian indices in C.data.
When it is the lower triangle of C being stored in C.data (and C.transa is false, which is always the case for us) the first half of the columns of C are either in their original positions in C.data or shifted down by one position.
I did a calculation of the number of updates in this nonflipped configuration versus the number in the flipped configuration for the insteval model. It was about 3/4 nonflipped and 1/4 flipped, which is roughly the proportion you would expect based on the sizes of the chunks in C.
(980444, 306131)
However, if we reorder the rows of A by decreasing row sums
julia> MixedModels.countinds(L[p, :])
(1221833, 64742)
we get about 95% non-flipped and that certainly seems worth it. So, at what point should the levels of the second grouping factor be reordered?
I think it will be rare to need to use RFP but, if L[2,1] is a SparseMatrixCSC there wouldn't be much harm in reordering the rows (i.e. permute the levels of the second grouping factor).
Is this explanation sufficiently intelligible to be able to offer an opinion?
@dmbates I think that change makes sense. I also wonder if this would slightly speed up things for the non RFP case --- I've certainly seen this type of optimization have big impacts in other code.
If I'm thinking about this correctly, ordering by row-sums works out to ordering by something like "strength of crossing", right?
We don't make any guarantees about the internal ordering of levels, so that's not an issue.
For the time being I will do the simplified indexing part and not do the reordering of levels part. However, we can recommend reordering the levels before creating the model.
I wrote the enclosed to do the reordering of levels in a CategoricalArray by incidence. (I'm not a big fan of CategoricalArrays.jl but it seemed the easiest way to accomplish this.)
using CategoricalArrays, DataFrames, MixedModels
"""
relevel_by_incidence!(v::CategoricalArray; rev=false)
Return `v` with the levels ordered by increasing (decreasing when `rev=true`) incidence.
"""
function relevel_by_incidence!(v::CategoricalArray; rev=false)
ra = refarray(droplevels!(v))
counts = zeros(Int, maximum(ra))
for i in ra
counts[i] += 1
end
levels!(v, levels(v)[sortperm(counts; rev)])
return v
end
dat = DataFrame(MixedModels.dataset(:insteval))
dat.d = relevel_by_incidence!(categorical(dat.d); rev=true)
m = LinearMixedModel(@formula(y ~ 1 + service + (1|s) + (1|d) + zerocorr(1+service|dept)), dat)
m.A[3].diag