Implement L0 cooperative kernel functions
Defines urKernelSuggestMaxCooperativeGroupCountExp and urEnqueueCooperativeKernelLaunchExp to enable cooperative kernels with more than one work group.
SYCL tests for this PR are passing here: https://github.com/intel/llvm/pull/13653.
The current implementation of urEnqueueCooperativeKernelLaunchExp is nearly identical to urEnqueueKernelLaunch. It has some minor differences and calls zeCommandListAppendLaunchCooperativeKernel. I'm not sure if there's a better way to define it that reuses code or if the preference is to leave the implementations separate so that they can diverge in the future.
It would be great to have some UR tests for cooperative kernels now that there is an implementation, is there a plan for that @0x12CC?
FYI that doesn't block this PR which is now in the merge queue.