Create a CUDA context
Thanks to IRTools.jl, we can do some nifty things with Julia IR. Like using a dynamo to walk through the deep IR and offload sensible ops to the GPU.
julia> c = Conv((3,3), 3 => 16, pad = (1,1), relu); # from Flux
julia> r = rand(Float32, 32, 32, 3, 100);
julia> cuda() do
c(r)
end # run on GPU
julia> a = rand(Float32, 5*10^4);
julia> b = rand(Float32, 5*10^4);
julia> cuda() do
a + b
end
50000-element Array{Float32,1}:
0.9649581
1.2122422
0.423553
...
Notice the return type is a normal Array, meaning that without much fidgeting, it is trivial to offload computation to the GPU and continue where you left off.
There are a couple caveats, not all functions behave nicely yet and we need better test coverage, but opening it now to get some review and direction of the way forward
cc @MikeInnes
ref https://github.com/JuliaGPU/CuArrays.jl/issues/303
Thanks! What is driving the choice to use IRTools over Cassette? I would prefer the maintenance burden to rest with Cassette (e.g. me)
The choice was made for the little nicer control over the IR with IRTools. Also, it's conceptually simpler so maintaining it should be easier also.
It was also fairly straightforward to define in lesser code making it more readable. Mind you I'm no cassette pro, but definitely worth a discussion.
@vchuravy there probably isn't much in it, so if the lead maintainers of this package strongly prefer Cassette then I imagine it'd be OK to port it over.
Though as Dhairya points out there's a couple of potential advantages to fine grained control of the IR pass; the main one is that it's easier to cut out classes of functions we're not interested in, e.g. intrinsics or certain modules in Base, avoiding some redundant recompilation.
Very interesting! Looking forward to giving this a spin, might open up some nice new ways of doing GPU computation.
I guess we'll need some way to assert GPU execution to actually test this?
Yeah, for the tests I was thinking just having a context which we can look into to assert that the array is actually in there and corresponds to memory associated with the GPU
Grrml Gmail ate my reply:
Since CUDAnative will use Cassette and GPUifyLoops already does I would strongly prefer only having one tool in the GPU ecosystem to do this. I would be interested in making IRTools transforms/utility functions work with Cassette, which should work relatively straightforwardly.