Enzyme.jl
Enzyme.jl copied to clipboard
Reverse gradient works with CPU + KA but errors out after ~minutes with error on CUDA + KA
Hi,
the following is veeeery slow on CUDA and also errors out:
julia> using CUDA, Enzyme, RadonKA, DifferentiationInterface
julia> function main()
arr = Array(rand(Float32, 256, 256, 32))
angles = Array(Float32.(range(0, 2π, 200)))
f(x, angles) = sum(radon(x, angles))
radon(arr, angles)
dx = 0 .* arr
x = copy(arr)
autodiff(Reverse, f, Active, Duplicated(x, dx), Const(angles))
return 0
end
main (generic function with 1 method)
julia> main()
0
julia> function main()
arr = CuArray(rand(Float32, 256, 256, 32))
angles = CuArray(Float32.(range(0, 2π, 200)))
f(x, angles) = sum(radon(x, angles))
radon(arr, angles)
dx = 0 .* arr
x = copy(arr)
autodiff(Reverse, f, Active, Duplicated(x, dx), Const(angles))
return 0
end
julia> main()
ERROR: Constant memory is stored (or returned) to a differentiable variable.
As a result, Enzyme cannot provably ensure correctness and throws this error.
This might be due to the use of a constant variable as temporary storage for active memory (https://enzyme.mit.edu/julia/stable/faq/#Runtime-Activity).
If Enzyme should be able to prove this use non-differentable, open an issue!
To work around this issue, either:
a) rewrite this variable to not be conditionally active (fastest, but requires a code change), or
b) set the Enzyme mode to turn on runtime activity (e.g. autodiff(set_runtime_activity(Reverse), ...) ). This will maintain correctness, but may slightly reduce performance.
Mismatched activity for: store {} addrspace(10)* %1, {} addrspace(10)* addrspace(10)* %.fca.2.gep, align 8, !dbg !1368, !noalias !1335 const val: {} addrspace(10)* %1
value=Unknown object of type CuArray{Float32, 1, CUDA.DeviceMemory}
llvalue={} addrspace(10)* %1
Stacktrace:
[1] #context!#990
@ ~/.julia/packages/CUDA/2kjXI/lib/cudadrv/state.jl:168
[2] context!
@ ~/.julia/packages/CUDA/2kjXI/lib/cudadrv/state.jl:163
[3] unsafe_copyto!
@ ~/.julia/packages/CUDA/2kjXI/src/array.jl:575
Stacktrace:
[1] #context!#990
@ ~/.julia/packages/CUDA/2kjXI/lib/cudadrv/state.jl:168 [inlined]
[2] context!
@ ~/.julia/packages/CUDA/2kjXI/lib/cudadrv/state.jl:163 [inlined]
[3] unsafe_copyto!
@ ~/.julia/packages/CUDA/2kjXI/src/array.jl:575
[4] copyto!
@ ~/.julia/packages/CUDA/2kjXI/src/array.jl:517 [inlined]
[5] copyto!
@ ~/.julia/packages/CUDA/2kjXI/src/array.jl:521 [inlined]
[6] copy
@ ~/.julia/packages/CUDA/2kjXI/src/array.jl:179 [inlined]
[7] CuArray
@ ~/.julia/packages/CUDA/2kjXI/src/array.jl:422 [inlined]
[8] _radon
@ ~/.julia/packages/RadonKA/FXHjL/src/radon.jl:145
[9] #radon#10
@ ~/.julia/packages/RadonKA/FXHjL/src/radon.jl:126 [inlined]
[10] radon
@ ~/.julia/packages/RadonKA/FXHjL/src/radon.jl:123
[11] f
@ ./REPL[27]:5 [inlined]
[12] diffejulia_f_184005wrap
@ ./REPL[27]:0
[13] macro expansion
@ ~/.julia/packages/Enzyme/RTS5U/src/compiler.jl:8398 [inlined]
[14] enzyme_call
@ ~/.julia/packages/Enzyme/RTS5U/src/compiler.jl:7950 [inlined]
[15] CombinedAdjointThunk
@ ~/.julia/packages/Enzyme/RTS5U/src/compiler.jl:7723 [inlined]
[16] autodiff
@ ~/.julia/packages/Enzyme/RTS5U/src/Enzyme.jl:491 [inlined]
[17] autodiff
@ ~/.julia/packages/Enzyme/RTS5U/src/Enzyme.jl:512 [inlined]
[18] main()
@ Main ./REPL[27]:11
[19] top-level scope
@ REPL[28]:1
julia> versioninfo()
Julia Version 1.10.6
Commit 67dffc4a8ae (2024-10-28 12:23 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 24 × AMD Ryzen 9 5900X 12-Core Processor
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 24 default, 0 interactive, 12 GC (on 24 virtual cores)
Environment:
JULIA_NUM_THREADS = 24
JULIA_MAX_NUM_PRECOMPILE_FILES = 100
julia> CUDA.versioninfo()
CUDA runtime 12.6, artifact installation
CUDA driver 12.4
NVIDIA driver 550.120.0
CUDA libraries:
- CUBLAS: 12.6.4
- CURAND: 10.3.7
- CUFFT: 11.3.0
- CUSOLVER: 11.7.1
- CUSPARSE: 12.5.4
- CUPTI: 2024.3.2 (API 24.0.0)
- NVML: 12.0.0+550.120
Julia packages:
- CUDA: 5.5.2
- CUDA_Driver_jll: 0.10.4+0
- CUDA_Runtime_jll: 0.15.5+0
Toolchain:
- Julia: 1.10.6
- LLVM: 15.0.7
1 device:
0: NVIDIA GeForce RTX 3060 (sm_86, 11.012 GiB / 12.000 GiB available)