Tullio.jl icon indicating copy to clipboard operation
Tullio.jl copied to clipboard

FR: Simplify loading with CUDA

Open marius311 opened this issue 3 years ago • 4 comments

For @tullio to generate a GPU version, you currently need using KernelAbstractions, CUDAKernels, CUDA before invoking the macro. This makes it a pain if you want to use Tullio in some library package which is GPU-optional and has some @require CUDA ... code. The only solution I've found is, in your own library package, have a @require CUDA and then re-include your own code which uses @tullio, so that Tullio generates the necessary GPU versions. This is pretty hacky / breaks precompilation / often doesn't play nicely with multiple processes for some reason.

It would be great if Tullio could be defined in a more standard way that didn't necessitate doing this.

marius311 avatar Jul 13 '22 23:07 marius311

This would be nice.

At the moment it calls KernelAbstractions.@kernel at macro-expansion-time, for the contents of a function which dispatches on Type{<:CuArray}. And @kernel wants, I think, to decide what code to generate based on what loops it sees.

It might be possible to do better if you instead stored what the macro sees in a type, and had a generated function write the code that's necessary, at compile time. This is (I think) what LoopVectorization does (and what ArrayMeta did). The main reason Tullio doesn't is that this is much harder to work on & debug. And I'm not certain it would help here -- or perhaps would need a similar re-write of KernelAbstractions.

Maybe someone else has a better idea, though?

mcabbott avatar Jul 24 '22 22:07 mcabbott

Yea I think the general strategy of the macro only stores needed info at parse-time and the kernel itself is built at compile-time is right, but I don't have anything else valuable to add that I'm sure you don't already know. Fwiw, maybe its helpful to others, my current solution to decorate any function that uses @tullio with

macro uses_tullio(funcdef)
    quote
        $(esc(funcdef))
        @init @require CUDA="052768ef-5323-5732-b1bb-66c8b64840ba" begin
            using KernelAbstractions, CUDAKernels, CUDA
            $(esc(funcdef))
        end
    end
end

which will redefine the function once CUDA is loaded. This is better than re-including entire files which leads to unnecessary invalidations / other errors.

marius311 avatar Aug 03 '22 19:08 marius311

This isn't a crazy idea. IIRC at some point Tullio defined the functions it needed globally, by calling eval. The name of the function was either gensym, or else a hash of its contents. With such a scheme, the CUDA version could be within an @require block.

The reason I switched it to the current behaviour of defining functions only in local scope, within a let block, was roughly to avoid eval and be a more normal macro. The global functions are harder to inspect, and with gensym you'll get a new one every time the macro is re-run; with hash I think I occasionally had confusion when updating the package & getting an old definition.

mcabbott avatar Aug 22 '22 20:08 mcabbott