Julian Samaroo

Results 413 comments of Julian Samaroo

Ok, now all calls generated by the `@kernel` macro forward down to `KernelAbstractions.construct`, which is defined for `::Device`, and ROCKernels defines its own for `::ROCDevice`.

Yeah, the MPI failures seem to be something specific to CI. Let's hold off on merging this until: - We've tested that this works properly for CUDAKernels (since we change...

https://github.com/JuliaGPU/AMDGPU.jl/pull/280

I want to recommend using AMDGPU's behavior, but it requires intrusive changes within the GPU array objects to support it, and likely adds some overhead during kernel launch (to search...

So CUDAKernels/ROCKernels are the packages you need to load that to get CUDA/AMDGPU support with KA, respectively. They each export `CUDADevice`/`ROCDevice`, which can be passed as the first argument (instead...

I suspect running with opaque pointers in Julia 1.10 should probably resolve this, as the pointer size is not hard-coded into the IR.

> I suspect the main difficulty is the way that TimeDag.run_node! currently works by mutating the node state [1]. I assume (?) that this is an awkward thing to support...

Hey @RuyiDu , not sure if you're still looking for an answer to this question (which I think is a valid one). I just got uGUI running on my STM32F746-Discovery...

Looking at `@code_llvm`, returning a `Tuple` uses the `sret` calling convention, which means that the first argument to the function is a slot allocated on the stack that the result...