On 1.1 Cassette should be performant enough for these kinds of transforms.

Fixes https://github.com/JuliaGPU/CUDAnative.jl/issues/27

@maleadt did you have a branch similar to this around?

Jan 29 '19 23:01 vchuravy

bors try

Jan 29 '19 23:01 vchuravy

try

Build failed

ci/gitlab/trying

Jan 29 '19 23:01 bors[bot]

Yes, https://github.com/JuliaGPU/CUDAnative.jl/compare/tb/cassette Didn't work because of plenty allocations, invokes, dispatches, etc. Is your approach different in that regard? Also, #265.

Jan 30 '19 07:01 maleadt

bors try

Jan 30 '19 18:01 vchuravy

As bors tells us apparently not ;)

@jrevels https://gitlab.com/JuliaGPU/CUDAnative.jl/-/jobs/153739960 is full of interesting cases.

Jan 30 '19 18:01 vchuravy

try

Build failed

ci/gitlab/trying

Jan 30 '19 18:01 bors[bot]

bors try

Jan 30 '19 21:01 vchuravy

try

Build failed

ci/gitlab/trying

Jan 30 '19 22:01 bors[bot]

Yeah, as I feared... Let's mark this WIP then :slightly_frowning_face:

Jan 31 '19 15:01 maleadt

bors try

Feb 01 '19 02:02 vchuravy

try

Build failed

ci/gitlab/trying

Feb 01 '19 02:02 bors[bot]

Same error count; inlining doesn't help. That said, many stack traces point to getindex again, so maybe there's only a small number of errors remaining. I'll have another go at reducing vadd when I have some time.

Feb 01 '19 06:02 maleadt

I was planning on grabbing Jarrett this week and see if we can figure it out. (I am in the progress to add GPU support to Cthulhu so that should make it easier)

Feb 04 '19 03:02 vchuravy

bors try

Feb 07 '19 23:02 vchuravy

try

Build failed

ci/gitlab/trying

Feb 07 '19 23:02 bors[bot]

Ok! The debugging session with Jarrett proved fruitful, we are down to 10ish failures :)

Feb 07 '19 23:02 vchuravy

Cool! What were the changes?

Feb 08 '19 17:02 maleadt

Cool! What were the changes?

We applied my usual Cassette issue workaround of "isolate the problematic thing and make it a contextual primitive (i.e. don't overdub into it)". The problematic thing here was the @pure function datatype_align.

It turns out that while Cassette propagates purity to the compiler correctly, the compiler is (probably rightfully) pessimistic and just bails out on purity optimization for generated functions (i.e. overdub). ref https://github.com/JuliaLang/julia/pull/31012, which is my naive attempt at changing the compiler to allow this sort of thing. If that lands, we can remove the extra contextual primitive definition here.

Feb 08 '19 17:02 jrevels

bors try

Feb 11 '19 17:02 vchuravy

try

Build failed

ci/gitlab/trying

Feb 11 '19 17:02 bors[bot]

I think we are down to two Cassette related issues while the rest is adjustment of tests/one level of indirection missing things up.

Feb 11 '19 18:02 vchuravy

julia> function kernel1(T, i)
           sink(i)
           return
       end
kernel1 (generic function with 1 method)

julia> @cuda kernel1(Int, 1)
ERROR: InvalidIRError: compiling #103(Type{Int64}, Int64) resulted in invalid LLVM IR
Reason: unsupported call to the Julia runtime (call to jl_f_tuple)
Stacktrace:
 [1] #103 at /home/tbesard/Julia/CUDAnative/src/context.jl:51
Reason: unsupported call to the Julia runtime (call to jl_f_getfield)
Stacktrace:
 [1] #103 at /home/tbesard/Julia/CUDAnative/src/context.jl:51

julia> inner_kwargf(foobar;foo=1, bar=2) = nothing
inner_kwargf (generic function with 1 method)

julia> @cuda (()->inner_kwargf(42;foo=1,bar=2))()
ERROR: GPU compilation of #103() failed
KernelError: kernel returns a value of type `Any`

Some more obscure errors as well, but these are the obvious codegen-related ones. bors try

Feb 12 '19 11:02 maleadt

try

Build failed

ci/gitlab/trying

Feb 12 '19 12:02 bors[bot]

I really dislike the loss of method redefinition support though, so either we need a proper fix or a hack (like emptying the CUDAnative compile cache upon every REPL execution -- but we don't have a useful REPL API for that) to support redefinitions.

EDIT: even emptying the compile cache isn't sufficient, there's other caching going on

# valid def
julia> foo() = nothing
julia> @cuda foo()

# invalid def
julia> foo() = 1
julia> @cuda foo()

# works, too bad

# I expected this to fail
julia> empty!(CUDAnative.compilecache); @cuda foo()

# to show the def is really invalid
julia> bar() = 1
julia> @cuda bar()
ERROR: GPU compilation of #103() failed
KernelError: kernel returns a value of type `Int64`

Feb 12 '19 12:02 maleadt

bors try

Feb 21 '19 21:02 vchuravy

try

Build failed

ci/gitlab/trying

Feb 21 '19 22:02 bors[bot]

I agree that losing the ability to redefine is annoying.

Regarding:

julia> function kernel1(T, i)
           sink(i)
           return
       end
kernel1 (generic function with 1 method)

julia> @cuda kernel1(Int, 1)
ERROR: InvalidIRError: compiling #103(Type{Int64}, Int64) resulted in invalid LLVM IR
Reason: unsupported call to the Julia runtime (call to jl_f_tuple)
Stacktrace:
 [1] #103 at /home/tbesard/Julia/CUDAnative/src/context.jl:51
Reason: unsupported call to the Julia runtime (call to jl_f_getfield)
Stacktrace:
 [1] #103 at /home/tbesard/Julia/CUDAnative/src/context.jl:51

This issue is that Cassette places a call to overdub(cudactx, Main.sink, i), which causes the jl_f_tuple to appear. Not sure how to fix this.

Feb 21 '19 22:02 vchuravy

argument count: Error During Test at /builds/JuliaGPU/CUDAnative.jl/test/device/execution.jl:440
  Got exception outside of a @test
  InvalidIRError: compiling #103(Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64) resulted in invalid LLVM IR
  Reason: unsupported call to the Julia runtime (call to jl_f__apply)
  Stacktrace:
   [1] #103 at /builds/JuliaGPU/CUDAnative.jl/src/context.jl:56
  Stacktrace:
   [1] check_ir(::CUDAnative.Com

Looks like the tuple limit.

bors try

Feb 21 '19 22:02 vchuravy

try

Build failed

ci/gitlab/trying

Feb 21 '19 22:02 bors[bot]

Ok that reduces it down to:

dummy
265 for Cassette, https://github.com/jrevels/Cassette.jl/issues/6
https://github.com/JuliaGPU/CUDAnative.jl/pull/334#issuecomment-466184514
Traces now have overdub in them, would be lovely to filter those out

Feb 21 '19 22:02 vchuravy

WIP: Use contextual dispatch for replacing functions

try

Build failed

try

Build failed

try

Build failed

try

Build failed

try

Build failed

try

Build failed

try

Build failed

try

Build failed

try

Build failed