Flux.jl icon indicating copy to clipboard operation
Flux.jl copied to clipboard

Add SnoopPrecompile

Open mcabbott opened this issue 3 years ago • 2 comments

Like https://github.com/FluxML/Zygote.jl/pull/1281 this adds https://github.com/timholy/SnoopCompile.jl

Timing some gradient of a model, it's not so obvious this saves time.

M1 mac, Julia 1.9- nightly

### Zygote v0.6.49, Flux v0.13.6

% time julia -e 'using Flux; gradient(m -> Flux.mse(m(rand(Float32,10,10,1,1)), rand(2,1)), Chain(Conv((3,3),1=>2), Flux.flatten, Dense(128=>2), softmax));' 
julia -e   29.41s user 2.05s system 115% cpu 27.155 total

% time julia -e '@time using Flux; @time gradient(m -> Flux.mse(m(rand(Float32,10,10,1,1)), rand(2,1)), Chain(Conv((3,3),1=>2), Flux.flatten, Dense(128=>2), softmax));'
  9.128240 seconds (16.43 M allocations: 1.131 GiB, 3.08% gc time, 47.75% compilation time: 67% of which was recompilation)
 17.482574 seconds (38.85 M allocations: 2.449 GiB, 4.74% gc time, 99.99% compilation time)
julia -e   28.84s user 1.98s system 113% cpu 27.189 total

### With Zygote#master:

% time julia -e '@time using Flux; @time gradient(m -> Flux.mse(m(rand(Float32,10,10,1,1)), rand(2,1)), Chain(Conv((3,3),1=>2), Flux.flatten, Dense(128=>2), softmax));'
  9.281118 seconds (16.90 M allocations: 1.167 GiB, 3.08% gc time, 46.41% compilation time: 66% of which was recompilation)
 12.865140 seconds (17.35 M allocations: 1.074 GiB, 2.66% gc time, 99.99% compilation time)
julia -e   25.27s user 1.89s system 119% cpu 22.726 total

% time julia -e '@time using Flux; @time gradient(m -> Flux.mse(m(rand(Float32,10,10,1,1)), rand(2,1)), Chain(Conv((3,3),1=>2), Flux.flatten, Dense(128=>2), softmax));'
  9.423306 seconds (16.90 M allocations: 1.167 GiB, 3.03% gc time, 46.45% compilation time: 67% of which was recompilation)
 13.019594 seconds (17.35 M allocations: 1.073 GiB, 2.73% gc time, 99.99% compilation time)
julia -e   24.13s user 1.86s system 112% cpu 23.022 total

### Zygote#master  + This PR

% time julia -e '@time using Flux; @time gradient(m -> Flux.mse(m(rand(Float32,10,10,1,1)), rand(2,1)), Chain(Conv((3,3),1=>2), Flux.flatten, Dense(128=>2), softmax));'
 11.086559 seconds (18.63 M allocations: 1.297 GiB, 6.81% gc time, 41.08% compilation time: 66% of which was recompilation)
 12.130469 seconds (7.22 M allocations: 461.883 MiB, 10.48% gc time, 99.99% compilation time)
julia -e   24.53s user 1.50s system 109% cpu 23.814 total

% time julia -e '@time using Flux; @time gradient(m -> Flux.mse(m(rand(Float32,10,10,1,1)), rand(2,1)), Chain(Conv((3,3),1=>2), Flux.flatten, Dense(128=>2), softmax));'
 10.476392 seconds (18.63 M allocations: 1.297 GiB, 3.49% gc time, 41.02% compilation time: 67% of which was recompilation)
 11.062946 seconds (7.22 M allocations: 461.883 MiB, 1.36% gc time, 99.99% compilation time)
julia -e   22.83s user 2.00s system 112% cpu 22.114 total

Cluster xeon with a GPU, Julia 1.8.0

### With Zygote#master

$ time julia-1.8.0/bin/julia -e '@time using Flux; @time gradient(m -> Flux.mse(m(rand(Float32,10,10,1,1)), rand(2,1)), Chain(Conv((3,3),1=>2), Flux.flatten, Dense(128=>2), softmax));'
 21.593748 seconds (22.75 M allocations: 1.485 GiB, 5.96% gc time, 23.77% compilation time: 70% of which was recompilation)
 37.201255 seconds (39.95 M allocations: 2.012 GiB, 4.36% gc time, 99.99% compilation time)

real	0m59.329s
user	1m2.320s
sys	0m2.783s

$ time julia-1.8.0/bin/julia -e '@time using Flux; @time gradient(m -> Flux.mse(m(rand(Float32,10,10,1,1)), rand(2,1)), Chain(Conv((3,3),1=>2), Flux.flatten, Dense(128=>2), softmax));'
 21.987707 seconds (22.75 M allocations: 1.485 GiB, 5.96% gc time, 24.21% compilation time: 69% of which was recompilation)
 37.505275 seconds (39.95 M allocations: 2.011 GiB, 4.39% gc time, 99.99% compilation time)

real	1m0.019s
user	1m2.634s
sys	0m2.701s

### + this PR

$ time julia-1.8.0/bin/julia -e '@time using Flux; @time gradient(m -> Flux.mse(m(rand(Float32,10,10,1,1)), rand(2,1)), Chain(Conv((3,3),1=>2), Flux.flatten, Dense(128=>2), softmax));'
 26.026605 seconds (26.32 M allocations: 1.719 GiB, 5.71% gc time, 20.17% compilation time: 70% of which was recompilation)
 30.415214 seconds (19.08 M allocations: 1002.718 MiB, 3.32% gc time, 99.99% compilation time)

real	0m56.918s
user	0m57.991s
sys	0m2.820s

$ time julia-1.8.0/bin/julia -e '@time using Flux; @time gradient(m -> Flux.mse(m(rand(Float32,10,10,1,1)), rand(2,1)), Chain(Conv((3,3),1=>2), Flux.flatten, Dense(128=>2), softmax));'
 25.925542 seconds (26.32 M allocations: 1.719 GiB, 5.61% gc time, 20.24% compilation time: 70% of which was recompilation)
 30.721970 seconds (19.08 M allocations: 1002.714 MiB, 3.69% gc time, 99.99% compilation time)

real	0m57.128s
user	0m58.368s
sys	0m3.024s

mcabbott avatar Oct 23 '22 03:10 mcabbott

Shall we wait until https://github.com/JuliaLang/julia/pull/47184 lands in nightly to re-evaluate timings? That chain of PRs should reduce the second number significantly. I'm also surprised that timings on the cluster are so slow—maybe something to do with slow file IO?

ToucheSir avatar Oct 25 '22 15:10 ToucheSir

If revived this should probably follow https://github.com/FluxML/Zygote.jl/pull/1415

mcabbott avatar May 11 '23 14:05 mcabbott