ZLUDA ZLUDA on Windows arm

Is it possible to run ZLUDA on the new qualcomm elite chip?

May 20 '24 19:05 dkostarnov

ZLUDA is for AMD GPUs at this moment.

May 21 '24 04:05 lshqqytiger

@vosen is there any plan to include Qualcomm chipset as part of this project? I would like to contribute to this as I have the hardware. If there is any guide on how to onboard new target that will help.

Jan 02 '25 04:01 sc231997

I don't have qcomm gpu at hand, but I had a quick read through their docs and Adreno does seem to be a much worse target than AMD or Intel:

Their OpenCL seems to be missing a lot of host features that are required or semi-required (no managed allocations, no ability to implictly use any SVM allocation). If they implemented at least cl_intel_unified_shared_memory it would work much better
Not sure if their OpenCL accepts SPIR-V, but if they don't then doing code emission is going to be "interesting". And there are going to be limitations
It seems their warp (subgroup) size is either 64 or 128. CUDA mandates 32. You can stuff two CUDA 32 wide warps in a single 64 hardware warp, but it's painful (and I'm not sure if doable with SPIR-V, on AMD it required an inline assembly prelude) and you always risk that there is software that will just hang

@sc231997 That being said any card is welcome as long as there's someone who can contribute. Just keep your expectations in check. With qcomm gpu you probably won't get anything even mildly complex to run. How to get started: try to make simple addition test in the compiler pass: cargo test -p ptx -- "::add". Steps would be:

In ptx:
- change test_ptx! macro to also generate test for your gpu
- add code to emit SPIR-V for your gpu (should work similarly to emit_llvm.rs, you can see older git revisions to see how previous Intel GPU SPIR-V emitter worked)

Then you'd need to start worrying about host code, but that is much easier mapping (when it's possible)

Jan 02 '25 16:01 vosen