AbstractAlgebra.jl icon indicating copy to clipboard operation
AbstractAlgebra.jl copied to clipboard

Matrix-Strassen doctest needs excessive time (needs fix and re-enabling)

Open lgoettgens opened this issue 6 months ago • 5 comments

See e.g. https://github.com/Nemocas/AbstractAlgebra.jl/actions/runs/14963175106/job/42028470785 The doctest step in the 1.10, ubuntu-latest step needed 15min, while the 1.10, macOS-latest step needed 64min. Similar timings can be seen for other invocations of CI as well.

Any idea what's causing this?

cc @fingolfin @benlorenz

lgoettgens avatar May 15 '25 13:05 lgoettgens

Not sure what happens in CI, but when testing locally on munk (with the documenter_helpers.jl from Oscar) most of the time of the doctest run was spent in src/Matrix-Strassen.jl:15-28:

page: src/Matrix-Strassen.jl:15-28
429.511165 seconds (12.88 G allocations: 233.800 GiB, 36.81% gc time, 0.49% compilation time)

(this is out of 8m11.5s total)

I pasted the code block in a julia 1.10 repl on my linux machine as well and it is still running after 13 minutes.

I wanted to re-run the doctests but munk kicked me out of my sessions and doesn't respond anymore:

page: src/Matrix-Strassen.jl:15-28
Connection to munk.mathematik.uni-kl.de closed by remote host.
Connection to munk.mathematik.uni-kl.de closed.

Edit: It seems to have rebooted, lets see what happens when running it again. Re-run also took about 8 minutes, the duration seems pretty stable.

The code block in question:

julia> m = matrix(ZZ, rand(-10:10, 1000, 1000));

julia> n1 = similar(m); n2 = similar(m); n3 = similar(m);

julia> n1 = mul!(n1, m, m);

julia> n2 = Strassen.mul!(n2, m, m);

julia> n3 = Strassen.mul!(n3, m, m; cutoff = 100);

julia> n1 == n2 == n3
true

This does rely on random numbers and I don't know if this is seeded in any way.

Edit: I have started a run on CI without the Strassen block and with the custom output: https://github.com/Nemocas/AbstractAlgebra.jl/actions/runs/15050984971 Using this commit: https://github.com/Nemocas/AbstractAlgebra.jl/commit/81911b6953ba64a099ed5ef68e97dcfc833120d8

benlorenz avatar May 15 '25 16:05 benlorenz

Without the Matrix-Strassen test the doctests took just 1min17s on macOS: https://github.com/Nemocas/AbstractAlgebra.jl/actions/runs/15050984971/job/42305480071#step:7:698

benlorenz avatar May 15 '25 17:05 benlorenz

Can we then remove this doctest for the meantime and let @fieker figure out the problem, and we re-enable the doctest once there is a fix?

lgoettgens avatar May 15 '25 17:05 lgoettgens

Some further data: Ubuntu CI (out of 15 min total):

page: src/Matrix-Strassen.jl:15-28
805.782778 seconds (12.88 G allocations: 233.799 GiB, 27.81% gc time, 0.49% compilation time)

Windows CI (out of 21 min total):

page: src\Matrix-Strassen.jl:15-28
1131.379076 seconds (12.88 G allocations: 233.796 GiB, 21.23% gc time, 0.38% compilation time)

macOS CI (out of 47 min total):

page: src/Matrix-Strassen.jl:15-28
2736.860504 seconds (12.88 G allocations: 233.799 GiB, 78.58% gc time, 0.54% compilation time)

macOS does have a lot more GC time, so it might be caused by the runner having less memory available, 7 GB vs 16GB on Linux.

21% of 2737 seconds gives 575 seconds and 72% of 805 seconds gives 580 seconds which does match quite well.

benlorenz avatar May 15 '25 18:05 benlorenz

The doctest in question has been disabled in https://github.com/Nemocas/AbstractAlgebra.jl/pull/2085 to reduce the load on the CI runners. @fieker could you please have a look if the main problem is the strassen code or the naive multiplication that's used in the tests? In case of the former, I think you would be interested in trying to find the underlying issue. For the latter case, could you think of something smaller to test that still hits all code branches?

lgoettgens avatar May 15 '25 19:05 lgoettgens