Valentin Churavy
Valentin Churavy
Not fully up to speed here, but my hope is #582 will help with these situations. But currently it does still create the issue that Enzyme want's to run some...
That would be wicked, but currently ``` function mykernel_grad(x, dx) autodiff_deferred(mykernel, Duplicated(x, dx)) end @cuda mykernel_grad ``` And so we never enter on the CPU an Enzyme scope.
cc: @pchintalapudi @gbaraldi for your pipeline expertise.
So I am only deleting top-level kernel calls. Since everything else is re-usable.
@maleadt are we tracking anywhere how big the modules are we load onto the GPU?
Which part should have been recompiled? When I went through the stack the `sm` influenced the native code generation not the inference result. Since the native code goes into a...
Yeah. My mental model is that we want a third cache layer. Initially I thought adding a field to CI would be helpful https://github.com/JuliaLang/julia/pull/53255 But it really is more a...
For now just rebased and included the code from Shenanigans.jl
@aviatesk can you take a look if this makes sense? The goal here is to have a stack of three method tables, the first layer superseding the second, the second...
> This issue likely applies to this PR as well. It might be worth being careful about whether there are any problems regarding this point. For our uses that is...