James Schloss

Results 253 comments of James Schloss

You are right. All the `XArrayBackends` can be removed. Let me mess around and test locally. The main thing that stalled the PR is that I couldn't figure out the...

I was literally just about to create an issue in KA about that. I'll go ahead and rebase everything up for this (these) PRs

Just rebased up (also had to revert the enzyme stuff). All tests pass locally on AMDGPU. Could we rerun the CI to make sure the errors are consistent on each...

So the main problem is with `launch_heuristic` and `launch_configuration` as well as `KernelAbstractions.launch_config`. These are kinda conflicting and I'm not actually sure how to write the `launch_heuristic` function in `CUDA/gpuarrays.jl`....

Map goes through `launch_heuristic` on Metal's side. I might have really messed up the logic somehow? AMDGPU doesn't have `launch_heuristic` and instead falls back to the default one defined here...

Just to be clear, there are 2 options: 1. `@inbounds J_c = CartesianIndices(axes(bc))[(J-1)*nelem + j]` 2. Remove the `launch_heuristic` / `elements_per_thread` approach. You are in favor of 2 with the...

Running out of time to keep debugging this today, so I'll just write everything down. After removing the `launch_heuristic` and `elements_per_thread` approach, I am getting errors with map for certain...

Oh, great! Tbh, I had to put this on hold for August because our daycare is closed this month and I'm juggling childcare duties. I should be able to pick...

I couldn't quite get https://github.com/JuliaGPU/GPUArrays.jl/pull/525/commits/00c8dd4912c5d1c4c4260f0b8baecf71647d52ad to work, so I reverted it to check to see if Metal would build. It seems like now CUDA and AMDGPU (locally) both pass, but...

Great! I'm working on this for like an hour or two a day, so expect a PR next week? #517 is kinda in the right direction, but I have another...