CUDAnative.jl icon indicating copy to clipboard operation
CUDAnative.jl copied to clipboard

WIP: Use contextual dispatch for replacing functions

Open vchuravy opened this issue 6 years ago • 57 comments

On 1.1 Cassette should be performant enough for these kinds of transforms.

Fixes https://github.com/JuliaGPU/CUDAnative.jl/issues/27

@maleadt did you have a branch similar to this around?

vchuravy avatar Jan 29 '19 23:01 vchuravy

bors try

vchuravy avatar Jan 29 '19 23:01 vchuravy

try

Build failed

bors[bot] avatar Jan 29 '19 23:01 bors[bot]

Yes, https://github.com/JuliaGPU/CUDAnative.jl/compare/tb/cassette Didn't work because of plenty allocations, invokes, dispatches, etc. Is your approach different in that regard? Also, #265.

maleadt avatar Jan 30 '19 07:01 maleadt

bors try

vchuravy avatar Jan 30 '19 18:01 vchuravy

As bors tells us apparently not ;)

@jrevels https://gitlab.com/JuliaGPU/CUDAnative.jl/-/jobs/153739960 is full of interesting cases.

vchuravy avatar Jan 30 '19 18:01 vchuravy

try

Build failed

bors[bot] avatar Jan 30 '19 18:01 bors[bot]

bors try

vchuravy avatar Jan 30 '19 21:01 vchuravy

try

Build failed

bors[bot] avatar Jan 30 '19 22:01 bors[bot]

Yeah, as I feared... Let's mark this WIP then :slightly_frowning_face:

maleadt avatar Jan 31 '19 15:01 maleadt

bors try

vchuravy avatar Feb 01 '19 02:02 vchuravy

try

Build failed

bors[bot] avatar Feb 01 '19 02:02 bors[bot]

Same error count; inlining doesn't help. That said, many stack traces point to getindex again, so maybe there's only a small number of errors remaining. I'll have another go at reducing vadd when I have some time.

maleadt avatar Feb 01 '19 06:02 maleadt

I was planning on grabbing Jarrett this week and see if we can figure it out. (I am in the progress to add GPU support to Cthulhu so that should make it easier)

vchuravy avatar Feb 04 '19 03:02 vchuravy

bors try

vchuravy avatar Feb 07 '19 23:02 vchuravy

try

Build failed

bors[bot] avatar Feb 07 '19 23:02 bors[bot]

Ok! The debugging session with Jarrett proved fruitful, we are down to 10ish failures :)

vchuravy avatar Feb 07 '19 23:02 vchuravy

Cool! What were the changes?

maleadt avatar Feb 08 '19 17:02 maleadt

Cool! What were the changes?

We applied my usual Cassette issue workaround of "isolate the problematic thing and make it a contextual primitive (i.e. don't overdub into it)". The problematic thing here was the @pure function datatype_align.

It turns out that while Cassette propagates purity to the compiler correctly, the compiler is (probably rightfully) pessimistic and just bails out on purity optimization for generated functions (i.e. overdub). ref https://github.com/JuliaLang/julia/pull/31012, which is my naive attempt at changing the compiler to allow this sort of thing. If that lands, we can remove the extra contextual primitive definition here.

jrevels avatar Feb 08 '19 17:02 jrevels

bors try

vchuravy avatar Feb 11 '19 17:02 vchuravy

try

Build failed

bors[bot] avatar Feb 11 '19 17:02 bors[bot]

I think we are down to two Cassette related issues while the rest is adjustment of tests/one level of indirection missing things up.

vchuravy avatar Feb 11 '19 18:02 vchuravy

julia> function kernel1(T, i)
           sink(i)
           return
       end
kernel1 (generic function with 1 method)

julia> @cuda kernel1(Int, 1)
ERROR: InvalidIRError: compiling #103(Type{Int64}, Int64) resulted in invalid LLVM IR
Reason: unsupported call to the Julia runtime (call to jl_f_tuple)
Stacktrace:
 [1] #103 at /home/tbesard/Julia/CUDAnative/src/context.jl:51
Reason: unsupported call to the Julia runtime (call to jl_f_getfield)
Stacktrace:
 [1] #103 at /home/tbesard/Julia/CUDAnative/src/context.jl:51
julia> inner_kwargf(foobar;foo=1, bar=2) = nothing
inner_kwargf (generic function with 1 method)

julia> @cuda (()->inner_kwargf(42;foo=1,bar=2))()
ERROR: GPU compilation of #103() failed
KernelError: kernel returns a value of type `Any`

Some more obscure errors as well, but these are the obvious codegen-related ones. bors try

maleadt avatar Feb 12 '19 11:02 maleadt

try

Build failed

bors[bot] avatar Feb 12 '19 12:02 bors[bot]

I really dislike the loss of method redefinition support though, so either we need a proper fix or a hack (like emptying the CUDAnative compile cache upon every REPL execution -- but we don't have a useful REPL API for that) to support redefinitions.

EDIT: even emptying the compile cache isn't sufficient, there's other caching going on

# valid def
julia> foo() = nothing
julia> @cuda foo()

# invalid def
julia> foo() = 1
julia> @cuda foo()

# works, too bad

# I expected this to fail
julia> empty!(CUDAnative.compilecache); @cuda foo()

# to show the def is really invalid
julia> bar() = 1
julia> @cuda bar()
ERROR: GPU compilation of #103() failed
KernelError: kernel returns a value of type `Int64`

maleadt avatar Feb 12 '19 12:02 maleadt

bors try

vchuravy avatar Feb 21 '19 21:02 vchuravy

try

Build failed

bors[bot] avatar Feb 21 '19 22:02 bors[bot]

I agree that losing the ability to redefine is annoying.

Regarding:

julia> function kernel1(T, i)
           sink(i)
           return
       end
kernel1 (generic function with 1 method)

julia> @cuda kernel1(Int, 1)
ERROR: InvalidIRError: compiling #103(Type{Int64}, Int64) resulted in invalid LLVM IR
Reason: unsupported call to the Julia runtime (call to jl_f_tuple)
Stacktrace:
 [1] #103 at /home/tbesard/Julia/CUDAnative/src/context.jl:51
Reason: unsupported call to the Julia runtime (call to jl_f_getfield)
Stacktrace:
 [1] #103 at /home/tbesard/Julia/CUDAnative/src/context.jl:51

This issue is that Cassette places a call to overdub(cudactx, Main.sink, i), which causes the jl_f_tuple to appear. Not sure how to fix this.

vchuravy avatar Feb 21 '19 22:02 vchuravy

argument count: Error During Test at /builds/JuliaGPU/CUDAnative.jl/test/device/execution.jl:440
  Got exception outside of a @test
  InvalidIRError: compiling #103(Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64) resulted in invalid LLVM IR
  Reason: unsupported call to the Julia runtime (call to jl_f__apply)
  Stacktrace:
   [1] #103 at /builds/JuliaGPU/CUDAnative.jl/src/context.jl:56
  Stacktrace:
   [1] check_ir(::CUDAnative.Com

Looks like the tuple limit.

bors try

vchuravy avatar Feb 21 '19 22:02 vchuravy

try

Build failed

bors[bot] avatar Feb 21 '19 22:02 bors[bot]

Ok that reduces it down to:

  • dummy
  • 265 for Cassette, https://github.com/jrevels/Cassette.jl/issues/6
  • https://github.com/JuliaGPU/CUDAnative.jl/pull/334#issuecomment-466184514
  • Traces now have overdub in them, would be lovely to filter those out

vchuravy avatar Feb 21 '19 22:02 vchuravy