MendelIHT.jl
MendelIHT.jl copied to clipboard
Precompilation takes forever
Some recent changes caused precompilation to be unacceptably slow (200-500+ seconds). MWE
using MendelIHT
cd(normpath(MendelIHT.datadir()))
julia> @time mses = cross_validate("normal", Normal, path=1:20);
245.610546 seconds (152.45 M allocations: 6.990 GiB, 1.27% gc time, 1.42% compilation time)
julia> @time mses = cross_validate("normal", Normal, path=1:20);
11.852532 seconds (22.98 M allocations: 531.709 MiB, 2.26% gc time)
A few more tests show the first call to cross_validate
can often take 400 or 500+ seconds
Also, this problem exacerbates with more cores.
4 cores (600 sec):
using Distributed
addprocs(4)
@everywhere using MendelIHT
cd(normpath(MendelIHT.datadir()))
julia> @time mses = cross_validate("normal", Normal, path=1:20);
668.532085 seconds (123.81 M allocations: 6.342 GiB, 0.68% gc time, 0.88% compilation time)
julia> @time mses = cross_validate("normal", Normal, path=1:20);
1.857292 seconds (123.08 k allocations: 12.577 MiB, 2.15% compilation time)
8 cores (1100 sec):
using Distributed
addprocs(8)
@everywhere using MendelIHT
cd(normpath(MendelIHT.datadir()))
julia> @time mses = cross_validate("normal", Normal, path=1:20);
1101.761663 seconds (123.81 M allocations: 6.342 GiB, 0.57% gc time, 0.35% compilation time)
julia> @time mses = cross_validate("normal", Normal, path=1:20);
1.091492 seconds (100.31 k allocations: 11.559 MiB)
Removing @_spawn
and @_sync
macros in SnpLinAlg
partially relieves the precompilation times with no performance regression. However, it is still slow.
1 core:
using MendelIHT
cd(normpath(MendelIHT.datadir()))
@time mses = cross_validate("normal", Normal, path=1:20);
178.086326 seconds (152.87 M allocations: 6.998 GiB, 1.50% gc time, 2.20% compilation time)
4 core
using Distributed
addprocs(4)
@everywhere using MendelIHT
cd(normpath(MendelIHT.datadir()))
@time mses = cross_validate("normal", Normal, path=1:20);
398.568449 seconds (123.74 M allocations: 6.338 GiB, 0.73% gc time, 0.91% compilation time)
@time mses = cross_validate("normal", Normal, path=1:20);
1.617772 seconds (96.55 k allocations: 10.984 MiB)
Interestingly, the same code on Julia 1.5.0 precompiles MUCH faster (with master branch SnpArrays), although timing for 4 cores is slower (2.3 sec on julia1.5 vs 1.6 sec on julia1.6)
1 core:
using MendelIHT
cd(normpath(MendelIHT.datadir()))
@time mses = cross_validate("normal", Normal, path=1:20);
63.740195 seconds (92.88 M allocations: 3.734 GiB, 3.63% gc time)
4 cores:
using Distributed
addprocs(4)
@everywhere using MendelIHT
cd(normpath(MendelIHT.datadir()))
@time mses = cross_validate("normal", Normal, path=1:20);
85.430388 seconds (49.52 M allocations: 2.241 GiB, 2.63% gc time)
@time mses = cross_validate("normal", Normal, path=1:20);
2.372901 seconds (104.69 k allocations: 11.298 MiB)
Cutting allocation in cross-validation partially alleviates the problem
1 core:
using MendelIHT
cd(normpath(MendelIHT.datadir()))
@time mses = cross_validate("normal", Normal, path=1:20);
119.331321 seconds (152.52 M allocations: 6.984 GiB, 2.18% gc time, 3.03% compilation time)
@time mses = cross_validate("normal", Normal, path=1:20);
5.301643 seconds (22.37 M allocations: 468.663 MiB, 2.00% gc time)
4 cores
using MendelIHT
cd(normpath(MendelIHT.datadir()))
@time mses = cross_validate("normal", Normal, path=1:20);
426.978452 seconds (130.85 M allocations: 6.625 GiB, 0.63% gc time, 0.81% compilation time)
@time mses = cross_validate("normal", Normal, path=1:20);
1.694783 seconds (184.54 k allocations: 45.308 MiB)
How about running precompilation with a smaller computation task? This might or might not help I guess...
Not sure.... I'm very confused at the moment.
5ec98be shifted most of the compilation time to pre-compilation, which may or may not be a good idea.
The result is that using MendelIHT
takes forever but subsequent function calls are fast.
1 thread:
@time using MendelIHT
149.479813 seconds (213.79 M allocations: 9.736 GiB, 2.26% gc time, 82.10% compilation time)
cd(normpath(MendelIHT.datadir()))
@time mses = cross_validate("normal", Normal, path=1:20);
6.014152 seconds (27.06 M allocations: 682.586 MiB, 4.24% gc time, 19.85% compilation time)
@time mses = cross_validate("normal", Normal, path=1:20);
5.075564 seconds (23.64 M allocations: 489.204 MiB, 4.89% gc time)
8 threads:
@time using MendelIHT
140.610868 seconds (219.84 M allocations: 10.071 GiB, 2.59% gc time, 93.64% compilation time)
cd(normpath(MendelIHT.datadir()))
@time mses = cross_validate("normal", Normal, path=1:20);
2.322273 seconds (27.86 M allocations: 700.764 MiB, 3.83% gc time, 52.01% compilation time)
@time mses = cross_validate("normal", Normal, path=1:20);
1.308751 seconds (24.58 M allocations: 512.554 MiB, 11.19% gc time)
Note: At some point we switched from using Distributed.jl
to using multithreading, which presumably explains some of the extra speed up with 1 core. Also, these tests were done on Julia 1.7.1 which may also be a lot faster than julia 1.5.x
Tim Holy said we should use PrecompileTools.jl to solve this problem in 2023 state of Julia talk