Arblib.jl
Arblib.jl copied to clipboard
A*A' significantly slower than A*A for ArbMatrices
Hi,
I was trying to see if I could use this instead of BigFloats for some of my computations, for which i (among other things like cholesky decompositions and computing the minimum eigenvalue) need to do A*A' for matrices of size up to about 300x300, which is currently the bottleneck.
I came across the following remarkable difference:
julia> A = ArbMatrix(rand(BigFloat,100,100));
julia> @benchmark $A*$A
BenchmarkTools.Trial:
memory estimate: 48 bytes
allocs estimate: 1
--------------
minimum time: 128.750 ms (0.00% GC)
median time: 132.684 ms (0.00% GC)
mean time: 133.845 ms (0.00% GC)
maximum time: 148.832 ms (0.00% GC)
--------------
samples: 38
evals/sample: 1
julia> @benchmark $A*$A'
BenchmarkTools.Trial:
memory estimate: 249.02 MiB
allocs estimate: 4080001
--------------
minimum time: 3.125 s (22.31% GC)
median time: 3.331 s (26.56% GC)
mean time: 3.331 s (26.56% GC)
maximum time: 3.537 s (30.31% GC)
--------------
samples: 2
evals/sample: 1
So multiplying with a transpose is significantly slower than normal multiplication, maybe because of all the allocations which do happen in the multiplication with the transpose. Comparing it to BigFloats:
julia> B = rand(BigFloat,100,100);
julia> @benchmark $B*$B
BenchmarkTools.Trial:
memory estimate: 218.35 MiB
allocs estimate: 4100002
--------------
minimum time: 911.017 ms (11.87% GC)
median time: 930.252 ms (11.95% GC)
mean time: 933.218 ms (11.67% GC)
maximum time: 962.344 ms (12.72% GC)
--------------
samples: 6
evals/sample: 1
julia> @benchmark $B*$B'
BenchmarkTools.Trial:
memory estimate: 218.35 MiB
allocs estimate: 4100002
--------------
minimum time: 1.228 s (8.92% GC)
median time: 1.237 s (9.04% GC)
mean time: 1.238 s (8.64% GC)
maximum time: 1.255 s (9.08% GC)
--------------
samples: 5
evals/sample: 1
So for BigFloats, multiplying with the transpose is also a bit slower, but only 30% compared to about 2500%. Another way would be to use normal matrices with Arb entries:
julia> C = rand(Arb,100,100);
julia> @benchmark $C*$C
BenchmarkTools.Trial:
memory estimate: 124.66 MiB
allocs estimate: 2040002
--------------
minimum time: 1.572 s (14.81% GC)
median time: 1.807 s (20.96% GC)
mean time: 1.872 s (25.84% GC)
maximum time: 2.237 s (37.54% GC)
--------------
samples: 3
evals/sample: 1
julia> @benchmark $C*$C'
BenchmarkTools.Trial:
memory estimate: 124.66 MiB
allocs estimate: 2040002
--------------
minimum time: 2.014 s (11.50% GC)
median time: 2.029 s (19.55% GC)
mean time: 2.166 s (22.57% GC)
maximum time: 2.456 s (34.14% GC)
--------------
samples: 3
evals/sample: 1
So this is more comparable to the BigFloats, only a factor 2 difference in both cases. (Interestingly, it only uses about half the memory/allocations)
To me it seems like the ArbMatrices fall back to the generic matrix multiplication in the case with the transpose. Is there an easy way around this?
PS: I also tried Nemo (which has similar speed for A*A and A*A'), but I also need the minimum eigenvalue and the cholesky decomposition, which are currently not in Nemo. So I would need to convert back and forth to BigFloats, or compute those myself. With Nemo, I'm not sure how to do this but with Arblib the conversion is easy. Hence why I would very much like a similar speed for A*A' as for A*A.
To me it seems like the ArbMatrices fall back to the generic matrix multiplication in the case with the transpose. Is there an easy way around this?
Arb only implements matrix multiplication A*B and no special versions where A or B are transposed. The easiest thing is probably to realize A' as an ArbMatrix and to then perform the multiplication. This is also how I would probably implement this here in Arblib. A pull request for this is very much welcomed 🙂
julia> @btime A*A;
11.988 ms (10089 allocations: 4.11 MiB)
julia> @btime A*A';
786.593 ms (8160005 allocations: 436.25 MiB)
julia> C = similar(A); D = similar(A); @btime Arblib.mul!($D, $A, Arblib.transpose!($C, $A));
12.785 ms (85 allocations: 3.20 MiB)
we're hitting the generic matmul here:
julia> @which A*A
*(A::T, B::T) where T<:Union{AcbMatrix, AcbRefMatrix, ArbMatrix, ArbRefMatrix} in Arblib at /home/kalmar/.julia/dev/Arblib/src/matrix.jl:166
julia> @which A*A'
*(A::AbstractMatrix{T} where T, B::AbstractMatrix{T} where T) in LinearAlgebra at /home/kalmar/packages/julias/julia-1.6/share/julia/stdlib/v1.6/LinearAlgebra/src/matmul.jl:151
A*ArbMatrix(A') works like a charm! Thanks!
I might look into a pull request later