Zygote.jl
Zygote.jl copied to clipboard
Flux & Zygote's AD slower than ForwardDiff
I found Zygote's recent advancement in AD and tried to benchmark it and found the following:
julia> @time ForwardDiff.gradient(rosenbrock,x);
0.074665 seconds (9 allocations: 1.070 MiB)
julia> @time Tracker.gradient(rosenbrock,x);
1.599425 seconds (937.69 k allocations: 2.269 GiB, 24.53% gc time)
julia> @time Zygote.gradient(rosenbrock,x);
2.726697 seconds (660.25 k allocations: 4.490 GiB, 23.72% gc time)
where the function rosenbrock is taken from here [Edit: now here]
and x = rand(10000);.
Three functions have been run multiple times for julia's JIT compilation.
I wonder what could be the reason for that?
I have been encountering poor performance as well. However, I cannot reproduce results that extreme. You should use BenchmarkTools and $ to produce more accurate benchmarking results.
using Distributions,Zygote,ForwardDiff,BenchmarkTools,Tracker,Random
Random.seed!(515)
function rosenbrock(x)
a = one(eltype(x))
b = 100 * a
result = zero(eltype(x))
for i in 1:length(x)-1
result += (a - x[i])^2 + b*(x[i+1] - x[i]^2)^2
end
return result
end
x = rand(1000)
@btime ForwardDiff.gradient($rosenbrock,$x)
@btime Tracker.gradient($rosenbrock,$x)
@btime Zygote.gradient($rosenbrock,$x)
Results:
4.272 ms (4 allocations: 110.72 KiB)
17.131 ms (117906 allocations: 26.56 MiB)
19.397 ms (72154 allocations: 48.29 MiB)
System information:
Ubuntu 18.04 Julia 1.1.1
(v1.1) pkg> st Zygote
Status `~/.julia/environments/v1.1/Project.toml`
[f6369f11] ForwardDiff v0.10.3
[e88e6eb3] Zygote v0.3.2
(v1.1) pkg> st Tracker
Status `~/.julia/environments/v1.1/Project.toml`
[f6369f11] ForwardDiff v0.10.3
[9f7883ad] Tracker v0.2.2
I think this is mainly because of tracing the for loop is a bit heavy for reverse mode, since we need to store each getindex as operator in the tape. In Tracker, this results in a bunch of getindex of length 1000, in Zygote this will be stored in Zygote.Stack IIUC which make it have a similar speed with Tracker.
This looks to be strictly a Zygote thing and could probably moved there (or closed, if we think it's an inherent design limitation) instead of Flux.
Indeed. This ought to be sped up by #962, and #981. Some times:
julia> @btime ForwardDiff.gradient($rosenbrock,$x);
534.750 μs (5 allocations: 111.86 KiB)
julia> @btime Tracker.gradient($rosenbrock,$x);
3.340 ms (89931 allocations: 25.94 MiB)
julia> @btime Zygote.gradient($rosenbrock,$x);
7.223 ms (73153 allocations: 48.21 MiB) # v0.6.11
4.195 ms (71154 allocations: 9.92 MiB) # v0.6.12, with 962
1.857 ms (63171 allocations: 2.04 MiB) # with 981
(Julia 1.6, M1 mac + rosetta). Bigger version:
julia> x = rand(10^5);
julia> @btime ForwardDiff.gradient($rosenbrock,$x);
6.056 s (6 allocations: 10.68 MiB)
julia> @btime Tracker.gradient($rosenbrock,$x);
ERROR: StackOverflowError:
julia> @btime Zygote.gradient($rosenbrock,$x);
28.132 s (7900279 allocations: 447.25 GiB) # v0.6.11
18.026 s (7200286 allocations: 74.73 GiB) # v0.6.12, with 962
672.375 ms (6300305 allocations: 211.20 MiB) # with 981