StackOverflow when used with Flux
I don't know where to begin for troubleshooting or making a minimal example. Or for a more specific title. First time trying Enzyme.
I changed the Flux train! function from:
Flux.train!(network, (training_data,), opt_state)
to:
Flux.train!(Duplicated(network, make_zero(network)), (training_data,), opt_state)
But I'm not sure if that's correct, documentation on this usage is a bit sparse.
Also not sure why I only have 22 frames shown.
ERROR: StackOverflowError:
Stacktrace:
[1] LLVMRunPassManager
@ C:\Users\nicho\.julia\packages\LLVM\UqMfW\lib\15\libLLVM.jl:3385 [inlined]
[2] run!
@ C:\Users\nicho\.julia\packages\LLVM\UqMfW\src\passmanager.jl:39 [inlined]
[3] #18868
@ C:\Users\nicho\.julia\packages\Enzyme\TiboG\src\compiler\optimize.jl:2010 [inlined]
[4] LLVM.ModulePassManager(::Enzyme.Compiler.var"#18868#18875"{LLVM.Module}; kwargs::@Kwargs{})
@ LLVM C:\Users\nicho\.julia\packages\LLVM\UqMfW\src\passmanager.jl:33
[5] ModulePassManager
@ C:\Users\nicho\.julia\packages\LLVM\UqMfW\src\passmanager.jl:30 [inlined]
[6] removeDeadArgs!(mod::LLVM.Module, tm::LLVM.TargetMachine)
@ Enzyme.Compiler C:\Users\nicho\.julia\packages\Enzyme\TiboG\src\compiler\optimize.jl:2008
[7] post_optimze!(mod::LLVM.Module, tm::LLVM.TargetMachine, machine::Bool)
@ Enzyme.Compiler C:\Users\nicho\.julia\packages\Enzyme\TiboG\src\compiler\optimize.jl:2283
[8] post_optimze!(mod::LLVM.Module, tm::LLVM.TargetMachine)
@ Enzyme.Compiler C:\Users\nicho\.julia\packages\Enzyme\TiboG\src\compiler\optimize.jl:2282
[9] _thunk(job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams}, postopt::Bool)
@ Enzyme.Compiler C:\Users\nicho\.julia\packages\Enzyme\TiboG\src\compiler.jl:7260
[10] _thunk
@ C:\Users\nicho\.julia\packages\Enzyme\TiboG\src\compiler.jl:7241 [inlined]
[11] cached_compilation
@ C:\Users\nicho\.julia\packages\Enzyme\TiboG\src\compiler.jl:7282 [inlined]
[12] thunkbase(ctx::LLVM.Context, mi::Core.MethodInstance, ::Val{…}, ::Type{…}, ::Type{…}, tt::Type{…}, ::Val{…}, ::Val{…}, ::Val{…}, ::Val{…}, ::Val{…}, ::Type{…}, ::Val{…})
@ Enzyme.Compiler C:\Users\nicho\.julia\packages\Enzyme\TiboG\src\compiler.jl:7355
[13] #s2055#19000
@ C:\Users\nicho\.julia\packages\Enzyme\TiboG\src\compiler.jl:7407 [inlined]
[14]
@ Enzyme.Compiler .\none:0
[15] (::Core.GeneratedFunctionStub)(::UInt64, ::LineNumberNode, ::Any, ::Vararg{Any})
@ Core .\boot.jl:602
[16] autodiff(::ReverseMode{…}, ::Const{…}, ::Type{…}, ::Const{…}, ::Duplicated{…}, ::Const{…}, ::Const{…})
@ Enzyme C:\Users\nicho\.julia\packages\Enzyme\TiboG\src\Enzyme.jl:263
[17] autodiff
@ C:\Users\nicho\.julia\packages\Enzyme\TiboG\src\Enzyme.jl:332 [inlined]
[18] macro expansion
@ C:\Users\nicho\.julia\packages\Flux\HBF2N\ext\FluxEnzymeExt\FluxEnzymeExt.jl:34 [inlined]
[19] macro expansion
@ C:\Users\nicho\.julia\packages\ProgressLogging\6KXlp\src\ProgressLogging.jl:328 [inlined]
[20] train!(loss::Function, model::Duplicated{…}, data::Tuple{…}, opt::@NamedTuple{…}; cb::Nothing)
@ FluxEnzymeExt C:\Users\nicho\.julia\packages\Flux\HBF2N\ext\FluxEnzymeExt\FluxEnzymeExt.jl:30
[21] train!(loss::Function, model::Duplicated{DecodeNet{…}}, data::Tuple{Tuple{…}}, opt::@NamedTuple{arch::@NamedTuple{…}})
@ FluxEnzymeExt C:\Users\nicho\.julia\packages\Flux\HBF2N\ext\FluxEnzymeExt\FluxEnzymeExt.jl:27
[22] train_network(name::String; learning_rate_schedule::Vector{…}, training_batch_size::Int64, evaluation_batch_size::Int64, iters_per_eval::Int64, seed::Int64, decode::Bool, wandb::Bool)
@ Main c:\Users\nicho\Repos\DeepLoco.jl\src\train.jl:213
Can you include a complete runnable code to try to reproduce? As well as show your OS/various package versions?
I'll see if I can boil down a MWE. In the meantime:
Julia Version 1.10.5
Commit 6f3fdf7b36 (2024-08-27 14:19 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: 12 × Snapdragon(R) X 12-core X1E80100 @ 3.40 GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, bdver1)
Threads: 1 default, 0 interactive, 1 GC (on 12 virtual cores)
Environment:
JULIA_EDITOR = code
JULIA_NUM_THREADS =
Status `~/Repos/DeepLoco.jl/Project.toml`
⌅ [052768ef] CUDA v5.4.3
[082447d4] ChainRules v1.71.0
[992eb4ea] CondaPkg v0.2.23
[b4f34e82] Distances v0.10.11
[31c24e10] Distributions v0.25.111
⌃ [7da242da] Enzyme v0.12.36
[587475ba] Flux v0.14.19
[033835bb] JLD2 v0.5.2
[f1d291b0] MLUtils v0.4.4
[91a5bcdd] Plots v1.40.8
[6099a3de] PythonCall v0.9.23
[e88e6eb3] Zygote v0.6.70
[02a925ec] cuDNN v1.3.2
[37e2e46d] LinearAlgebra
[9a3f8284] Random
[10745b16] Statistics v1.10.0
I see Enzyme just recently bumped to 0.13, but seems Flux doesn't support it yet. Also, I realize this is Julia x86_64 running in emulation on Windows on ARM. I'll try on Julia AArch64 via WSL later.
is this still an issue btw
I haven't followed up because I got Zygote to work, so feel free to close.