sycl
sycl copied to clipboard
[SYCL][Runtime] Hierarchical Parallelism/parallel_for_workgroup port for Xilinx FPGA
Currently parallel_for_workgroup
is implemented for intel devices, but will not work on Xilinx devices. It should hopefully be a very easy fix, but it's low priority for us at the time of writing this issue (the construct has limited value for FPGA).
For the most part it should just take some work to find SPIR equivalents to the SPIRV intrinsics inside of group (should just be barriers and fences): https://github.com/triSYCL/sycl/blob/sycl/unified/next/sycl/include/CL/sycl/group.hpp#L36
And then implement the change in a way that will not break the Intel implementation, while allowing it to work for Xilinx FPGA. This can and should probably be done similarly to the existing SPIR SIMT intrinsics (get_global_id/get_global_size etc.).
Most of the existing legwork should be there, should just be a job of understanding how we do the existing spir-df intrinsics and learning to do the same for the barrier intrinsics! That's at the time of writing, this may change over time... and if you can see a better way of doing the intrinsics than currently exists, by all means feel free to improve it (we're not married to the current implementation, it's a means to an end!).
With the addition of: https://github.com/intel/llvm/pull/345/files this task is likely to be less simple as it may require altering the Clang CodeGen a little in SYCLLowerIR/LowerWGScope to CodeGen SPIR instead of SPIRV functions. Or alternatively we can create an LLVM pass that will transform the resulting SPIRV builtins to SPIR or some replacement. The CodeGen component is required to achieve the correct SYCL scope/address space semantics for hierarchical parallelism.
As discussed, we do not really care about hierarchical parallelism for FPGA with SPIR or SPIR-V because it is not about handling a plate of thread spaghetti like in GPU.
You can implement everything in a single_task
and you get back into a Vivado HLS world.
Just use for example https://github.com/triSYCL/triSYCL/blob/d6abe3c60250ed22891904e685c425c039360f5d/include/triSYCL/parallelism/detail/parallelism.hpp#L221