Enzyme.jl icon indicating copy to clipboard operation
Enzyme.jl copied to clipboard

Reverse gradient works with CPU + KA but errors out after ~minutes with error on CUDA + KA

Open roflmaostc opened this issue 11 months ago • 3 comments

Hi,

the following is veeeery slow on CUDA and also errors out:

julia> using CUDA, Enzyme, RadonKA, DifferentiationInterface

julia> function main()
                  arr = Array(rand(Float32, 256, 256, 32))
                  angles = Array(Float32.(range(0, 2π, 200)))
                      
                  f(x, angles) = sum(radon(x, angles))
                  radon(arr, angles)
       
                  dx = 0 .* arr
                  x = copy(arr)
                  autodiff(Reverse, f, Active, Duplicated(x, dx), Const(angles))
              
                  return 0
              end
main (generic function with 1 method)

julia> main()
0


julia> function main()
                  arr = CuArray(rand(Float32, 256, 256, 32))
                  angles = CuArray(Float32.(range(0, 2π, 200)))
                      
                  f(x, angles) = sum(radon(x, angles))
                  radon(arr, angles)
       
                  dx = 0 .* arr
                  x = copy(arr)
                  autodiff(Reverse, f, Active, Duplicated(x, dx), Const(angles))

                  return 0
              end

julia> main()
ERROR: Constant memory is stored (or returned) to a differentiable variable.
As a result, Enzyme cannot provably ensure correctness and throws this error.
This might be due to the use of a constant variable as temporary storage for active memory (https://enzyme.mit.edu/julia/stable/faq/#Runtime-Activity).
If Enzyme should be able to prove this use non-differentable, open an issue!
To work around this issue, either:
 a) rewrite this variable to not be conditionally active (fastest, but requires a code change), or
 b) set the Enzyme mode to turn on runtime activity (e.g. autodiff(set_runtime_activity(Reverse), ...) ). This will maintain correctness, but may slightly reduce performance.
Mismatched activity for:   store {} addrspace(10)* %1, {} addrspace(10)* addrspace(10)* %.fca.2.gep, align 8, !dbg !1368, !noalias !1335 const val: {} addrspace(10)* %1
 value=Unknown object of type CuArray{Float32, 1, CUDA.DeviceMemory}
 llvalue={} addrspace(10)* %1

Stacktrace:
 [1] #context!#990
   @ ~/.julia/packages/CUDA/2kjXI/lib/cudadrv/state.jl:168
 [2] context!
   @ ~/.julia/packages/CUDA/2kjXI/lib/cudadrv/state.jl:163
 [3] unsafe_copyto!
   @ ~/.julia/packages/CUDA/2kjXI/src/array.jl:575

Stacktrace:
  [1] #context!#990
    @ ~/.julia/packages/CUDA/2kjXI/lib/cudadrv/state.jl:168 [inlined]
  [2] context!
    @ ~/.julia/packages/CUDA/2kjXI/lib/cudadrv/state.jl:163 [inlined]
  [3] unsafe_copyto!
    @ ~/.julia/packages/CUDA/2kjXI/src/array.jl:575
  [4] copyto!
    @ ~/.julia/packages/CUDA/2kjXI/src/array.jl:517 [inlined]
  [5] copyto!
    @ ~/.julia/packages/CUDA/2kjXI/src/array.jl:521 [inlined]
  [6] copy
    @ ~/.julia/packages/CUDA/2kjXI/src/array.jl:179 [inlined]
  [7] CuArray
    @ ~/.julia/packages/CUDA/2kjXI/src/array.jl:422 [inlined]
  [8] _radon
    @ ~/.julia/packages/RadonKA/FXHjL/src/radon.jl:145
  [9] #radon#10
    @ ~/.julia/packages/RadonKA/FXHjL/src/radon.jl:126 [inlined]
 [10] radon
    @ ~/.julia/packages/RadonKA/FXHjL/src/radon.jl:123
 [11] f
    @ ./REPL[27]:5 [inlined]
 [12] diffejulia_f_184005wrap
    @ ./REPL[27]:0
 [13] macro expansion
    @ ~/.julia/packages/Enzyme/RTS5U/src/compiler.jl:8398 [inlined]
 [14] enzyme_call
    @ ~/.julia/packages/Enzyme/RTS5U/src/compiler.jl:7950 [inlined]
 [15] CombinedAdjointThunk
    @ ~/.julia/packages/Enzyme/RTS5U/src/compiler.jl:7723 [inlined]
 [16] autodiff
    @ ~/.julia/packages/Enzyme/RTS5U/src/Enzyme.jl:491 [inlined]
 [17] autodiff
    @ ~/.julia/packages/Enzyme/RTS5U/src/Enzyme.jl:512 [inlined]
 [18] main()
    @ Main ./REPL[27]:11
 [19] top-level scope
    @ REPL[28]:1

julia> versioninfo()
Julia Version 1.10.6
Commit 67dffc4a8ae (2024-10-28 12:23 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 24 × AMD Ryzen 9 5900X 12-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 24 default, 0 interactive, 12 GC (on 24 virtual cores)
Environment:
  JULIA_NUM_THREADS = 24
  JULIA_MAX_NUM_PRECOMPILE_FILES = 100

julia> CUDA.versioninfo()
CUDA runtime 12.6, artifact installation
CUDA driver 12.4
NVIDIA driver 550.120.0

CUDA libraries: 
- CUBLAS: 12.6.4
- CURAND: 10.3.7
- CUFFT: 11.3.0
- CUSOLVER: 11.7.1
- CUSPARSE: 12.5.4
- CUPTI: 2024.3.2 (API 24.0.0)
- NVML: 12.0.0+550.120

Julia packages: 
- CUDA: 5.5.2
- CUDA_Driver_jll: 0.10.4+0
- CUDA_Runtime_jll: 0.15.5+0

Toolchain:
- Julia: 1.10.6
- LLVM: 15.0.7

1 device:
  0: NVIDIA GeForce RTX 3060 (sm_86, 11.012 GiB / 12.000 GiB available)

roflmaostc avatar Nov 28 '24 14:11 roflmaostc