Zygote.jl icon indicating copy to clipboard operation
Zygote.jl copied to clipboard

Flux & Zygote's AD slower than ForwardDiff

Open fangzhou-xie opened this issue 6 years ago • 4 comments

I found Zygote's recent advancement in AD and tried to benchmark it and found the following:

julia> @time ForwardDiff.gradient(rosenbrock,x);
  0.074665 seconds (9 allocations: 1.070 MiB)

julia> @time Tracker.gradient(rosenbrock,x);
  1.599425 seconds (937.69 k allocations: 2.269 GiB, 24.53% gc time)

julia> @time Zygote.gradient(rosenbrock,x);
  2.726697 seconds (660.25 k allocations: 4.490 GiB, 23.72% gc time)

where the function rosenbrock is taken from here [Edit: now here] and x = rand(10000);. Three functions have been run multiple times for julia's JIT compilation.

I wonder what could be the reason for that?

fangzhou-xie avatar Jul 28 '19 21:07 fangzhou-xie

I have been encountering poor performance as well. However, I cannot reproduce results that extreme. You should use BenchmarkTools and $ to produce more accurate benchmarking results.

using Distributions,Zygote,ForwardDiff,BenchmarkTools,Tracker,Random

Random.seed!(515)

function rosenbrock(x)
   a = one(eltype(x))
   b = 100 * a
   result = zero(eltype(x))
   for i in 1:length(x)-1
       result += (a - x[i])^2 + b*(x[i+1] - x[i]^2)^2
   end
   return result
end
       
x = rand(1000)

@btime ForwardDiff.gradient($rosenbrock,$x)
@btime Tracker.gradient($rosenbrock,$x)
@btime Zygote.gradient($rosenbrock,$x)

Results:

 4.272 ms (4 allocations: 110.72 KiB)
 17.131 ms (117906 allocations: 26.56 MiB)
 19.397 ms (72154 allocations: 48.29 MiB)

System information:

Ubuntu 18.04 Julia 1.1.1

(v1.1) pkg> st Zygote
    Status `~/.julia/environments/v1.1/Project.toml`
  [f6369f11] ForwardDiff v0.10.3
  [e88e6eb3] Zygote v0.3.2

(v1.1) pkg> st Tracker
    Status `~/.julia/environments/v1.1/Project.toml`
  [f6369f11] ForwardDiff v0.10.3
  [9f7883ad] Tracker v0.2.2

itsdfish avatar Jul 30 '19 08:07 itsdfish

I think this is mainly because of tracing the for loop is a bit heavy for reverse mode, since we need to store each getindex as operator in the tape. In Tracker, this results in a bunch of getindex of length 1000, in Zygote this will be stored in Zygote.Stack IIUC which make it have a similar speed with Tracker.

Roger-luo avatar Aug 20 '19 02:08 Roger-luo

This looks to be strictly a Zygote thing and could probably moved there (or closed, if we think it's an inherent design limitation) instead of Flux.

ToucheSir avatar Jun 14 '21 23:06 ToucheSir

Indeed. This ought to be sped up by #962, and #981. Some times:

julia> @btime ForwardDiff.gradient($rosenbrock,$x);
  534.750 μs (5 allocations: 111.86 KiB)

julia> @btime Tracker.gradient($rosenbrock,$x);
  3.340 ms (89931 allocations: 25.94 MiB)

julia> @btime Zygote.gradient($rosenbrock,$x);
  7.223 ms (73153 allocations: 48.21 MiB) # v0.6.11
  4.195 ms (71154 allocations: 9.92 MiB)  # v0.6.12, with 962
  1.857 ms (63171 allocations: 2.04 MiB)  # with 981

(Julia 1.6, M1 mac + rosetta). Bigger version:

julia> x = rand(10^5);

julia> @btime ForwardDiff.gradient($rosenbrock,$x);
  6.056 s (6 allocations: 10.68 MiB)

julia> @btime Tracker.gradient($rosenbrock,$x);
ERROR: StackOverflowError:

julia> @btime Zygote.gradient($rosenbrock,$x);
  28.132 s (7900279 allocations: 447.25 GiB) # v0.6.11
  18.026 s (7200286 allocations: 74.73 GiB)  # v0.6.12, with 962
  672.375 ms (6300305 allocations: 211.20 MiB)  # with 981

mcabbott avatar Jun 14 '21 23:06 mcabbott