Investigate improvements to Kernel Bundle workaround for querying max workgroup size

Open danhoeflinger opened this issue 2 years ago • 0 comments

In a number of algorithms within oneDPL, we rely upon compiling kernels and then querying exact workgroup size of compiled kernels to work around an issue with incorrect returns from get_info<sycl::info::device::max_work_group_size>() for CPU and FPGA_Emulator targets in some cases.

It seems that in practice, the returned workgroup sizes which are problematic are extremely large, which in practice exhausts some other resource at runtime. It requires further investigation to confirm this is the only type of "untrustworthy" value returned which requires this workaround. If this is the only reason for the workaround, we should consider instead limiting the workgroup size to some "reasonable" value which does not hinder performance.

This could significantly reduce complexity of the workaround. Exact Kernel Name generation and compilation of kernels would not be required, and could be replaced by merely limiting the max workgroup size to a reasonable value.

This could also provide some performance improvements as the kernel bundle workaround requires some overhead, especially for smaller kernel invocations.

Dec 14 '23 19:12 danhoeflinger