MendelIHT.jl icon indicating copy to clipboard operation
MendelIHT.jl copied to clipboard

Precompilation takes forever

Open biona001 opened this issue 3 years ago • 9 comments

Some recent changes caused precompilation to be unacceptably slow (200-500+ seconds). MWE

using MendelIHT
cd(normpath(MendelIHT.datadir()))

julia> @time mses = cross_validate("normal", Normal, path=1:20); 
245.610546 seconds (152.45 M allocations: 6.990 GiB, 1.27% gc time, 1.42% compilation time)

julia> @time mses = cross_validate("normal", Normal, path=1:20); 
 11.852532 seconds (22.98 M allocations: 531.709 MiB, 2.26% gc time)

biona001 avatar Jul 07 '21 00:07 biona001

A few more tests show the first call to cross_validate can often take 400 or 500+ seconds

biona001 avatar Jul 07 '21 02:07 biona001

Also, this problem exacerbates with more cores.

4 cores (600 sec):

using Distributed 
addprocs(4)
@everywhere using MendelIHT
cd(normpath(MendelIHT.datadir()))

julia> @time mses = cross_validate("normal", Normal, path=1:20);
668.532085 seconds (123.81 M allocations: 6.342 GiB, 0.68% gc time, 0.88% compilation time)

julia> @time mses = cross_validate("normal", Normal, path=1:20);
1.857292 seconds (123.08 k allocations: 12.577 MiB, 2.15% compilation time)

8 cores (1100 sec):

using Distributed 
addprocs(8)
@everywhere using MendelIHT
cd(normpath(MendelIHT.datadir()))

julia> @time mses = cross_validate("normal", Normal, path=1:20);
1101.761663 seconds (123.81 M allocations: 6.342 GiB, 0.57% gc time, 0.35% compilation time)

julia> @time mses = cross_validate("normal", Normal, path=1:20);
1.091492 seconds (100.31 k allocations: 11.559 MiB)

biona001 avatar Jul 07 '21 04:07 biona001

Removing @_spawn and @_sync macros in SnpLinAlg partially relieves the precompilation times with no performance regression. However, it is still slow.

1 core:

using MendelIHT
cd(normpath(MendelIHT.datadir()))
@time mses = cross_validate("normal", Normal, path=1:20);
178.086326 seconds (152.87 M allocations: 6.998 GiB, 1.50% gc time, 2.20% compilation time)

4 core

using Distributed
addprocs(4)
@everywhere using MendelIHT
cd(normpath(MendelIHT.datadir()))

@time mses = cross_validate("normal", Normal, path=1:20);
398.568449 seconds (123.74 M allocations: 6.338 GiB, 0.73% gc time, 0.91% compilation time)

@time mses = cross_validate("normal", Normal, path=1:20);
1.617772 seconds (96.55 k allocations: 10.984 MiB)

biona001 avatar Jul 07 '21 04:07 biona001

Interestingly, the same code on Julia 1.5.0 precompiles MUCH faster (with master branch SnpArrays), although timing for 4 cores is slower (2.3 sec on julia1.5 vs 1.6 sec on julia1.6)

1 core:

using MendelIHT
cd(normpath(MendelIHT.datadir()))
@time mses = cross_validate("normal", Normal, path=1:20);
 63.740195 seconds (92.88 M allocations: 3.734 GiB, 3.63% gc time)

4 cores:

using Distributed
addprocs(4)
@everywhere using MendelIHT
cd(normpath(MendelIHT.datadir()))

@time mses = cross_validate("normal", Normal, path=1:20);
85.430388 seconds (49.52 M allocations: 2.241 GiB, 2.63% gc time)

@time mses = cross_validate("normal", Normal, path=1:20);
2.372901 seconds (104.69 k allocations: 11.298 MiB)

biona001 avatar Jul 07 '21 05:07 biona001

Cutting allocation in cross-validation partially alleviates the problem

1 core:

using MendelIHT
cd(normpath(MendelIHT.datadir()))
@time mses = cross_validate("normal", Normal, path=1:20);
119.331321 seconds (152.52 M allocations: 6.984 GiB, 2.18% gc time, 3.03% compilation time)

@time mses = cross_validate("normal", Normal, path=1:20);
  5.301643 seconds (22.37 M allocations: 468.663 MiB, 2.00% gc time)

4 cores

using MendelIHT
cd(normpath(MendelIHT.datadir()))
@time mses = cross_validate("normal", Normal, path=1:20);
426.978452 seconds (130.85 M allocations: 6.625 GiB, 0.63% gc time, 0.81% compilation time)

@time mses = cross_validate("normal", Normal, path=1:20);
  1.694783 seconds (184.54 k allocations: 45.308 MiB)

biona001 avatar Jul 08 '21 03:07 biona001

How about running precompilation with a smaller computation task? This might or might not help I guess...

kose-y avatar Jul 08 '21 04:07 kose-y

Not sure.... I'm very confused at the moment.

biona001 avatar Jul 08 '21 06:07 biona001

5ec98be shifted most of the compilation time to pre-compilation, which may or may not be a good idea. The result is that using MendelIHT takes forever but subsequent function calls are fast.

1 thread:

@time using MendelIHT 
149.479813 seconds (213.79 M allocations: 9.736 GiB, 2.26% gc time, 82.10% compilation time)

cd(normpath(MendelIHT.datadir()))
@time mses = cross_validate("normal", Normal, path=1:20);
6.014152 seconds (27.06 M allocations: 682.586 MiB, 4.24% gc time, 19.85% compilation time)

@time mses = cross_validate("normal", Normal, path=1:20);
5.075564 seconds (23.64 M allocations: 489.204 MiB, 4.89% gc time)

8 threads:

@time using MendelIHT
140.610868 seconds (219.84 M allocations: 10.071 GiB, 2.59% gc time, 93.64% compilation time)

cd(normpath(MendelIHT.datadir()))
@time mses = cross_validate("normal", Normal, path=1:20);
2.322273 seconds (27.86 M allocations: 700.764 MiB, 3.83% gc time, 52.01% compilation time)

@time mses = cross_validate("normal", Normal, path=1:20);
1.308751 seconds (24.58 M allocations: 512.554 MiB, 11.19% gc time)

Note: At some point we switched from using Distributed.jl to using multithreading, which presumably explains some of the extra speed up with 1 core. Also, these tests were done on Julia 1.7.1 which may also be a lot faster than julia 1.5.x

biona001 avatar Feb 07 '22 04:02 biona001

Tim Holy said we should use PrecompileTools.jl to solve this problem in 2023 state of Julia talk

biona001 avatar Aug 10 '23 18:08 biona001