llvm icon indicating copy to clipboard operation
llvm copied to clipboard

[SYCL] Add num_dim_reqd_work_group_size metadata

Open jchlanda opened this issue 1 year ago • 11 comments

Emit metadata to describe number of dimensions specified in reqd_work_group_size. This is needed in order to be able to use that metadata correctly, since it was specified for OpenCL, and SYCL piggy-backs on it, backends correctly assert if all 3 dimensions are not provided. num_dim_reqd_work_group_size allows the compiler to padd the missing dimensions with 1, while preserving the notion of how many dimensions were specified.

jchlanda avatar Apr 30 '24 12:04 jchlanda

Corresponding UR work: https://github.com/oneapi-src/unified-runtime/pull/954

jchlanda avatar Apr 30 '24 12:04 jchlanda

Fixes: https://github.com/intel/llvm/issues/9353

jchlanda avatar Apr 30 '24 12:04 jchlanda

@jchlanda, could you please add FE test?

smanna12 avatar Apr 30 '24 13:04 smanna12

Regarding the naming, would this problem (and solution) not also apply to work_group_size_hint metadata? If so, could this metadata you're adding be made more general somehow? Maybe just trimming off the reqd_, or something else we can discuss down at the bike shed :)

frasercrmck avatar Apr 30 '24 13:04 frasercrmck

@jchlanda, could you please add FE test?

@smanna12 I've updated the test to use the new metadata node here: 6e4364344269567bd78f712d4bd9ad2a2c3c66b3, do you think this gives us enough coverage?

jchlanda avatar May 01 '24 10:05 jchlanda

Regarding the naming, would this problem (and solution) not also apply to work_group_size_hint metadata? If so, could this metadata you're adding be made more general somehow? Maybe just trimming off the reqd_, or something else we can discuss down at the bike shed :)

I think you are right, the spec this about the hint:

The number of arguments must match the dimensionality of the work-group used to invoke the kernel, and the order of the arguments matches the order of the dimension extents to the range constructor.

I'm happy to rename it to num_dim_work_group_size, or any other thing that you'd like to suggest @frasercrmck . Naming is hard...

jchlanda avatar May 01 '24 10:05 jchlanda

Regarding the naming, would this problem (and solution) not also apply to work_group_size_hint metadata? If so, could this metadata you're adding be made more general somehow? Maybe just trimming off the reqd_, or something else we can discuss down at the bike shed :)

I think you are right, the spec this about the hint:

The number of arguments must match the dimensionality of the work-group used to invoke the kernel, and the order of the arguments matches the order of the dimension extents to the range constructor.

I'm happy to rename it to num_dim_work_group_size, or any other thing that you'd like to suggest @frasercrmck . Naming is hard...

Yeah... work_group_num_dim, num_work_group_dim, num_dim_work_group (in rough priority order)... Not sure the size component is particularly important.

We've perhaps got a short window to rename it once it's merged but it'd be better to get it right first time, before downstream users start relying on the name!

frasercrmck avatar May 01 '24 15:05 frasercrmck

Aha! Am I understanding this right that the problem is due to the AMD target expecting all 3 dimensions in the "reqd_work_group_size" metadata node? If so, I can see how the current implementation is problematic. The alternative it to change it in upstream, but I suppose the padding may as well be done ASAP and then we can have the "specified dimensions" information separately for the SYCL case.

That's exactly right, we had a couple of goes at it in the past:

  • https://github.com/llvm/llvm-project/pull/68872
  • https://github.com/llvm/llvm-project/pull/72652

The feedback that we received was that we piggy packed on already existing metadata and we should not modify its semantics, something that I tend to agree with. Preserving the notion of the number of specified dimensions seems like the most elegant solution.

As for the approach on how to fix it, I think this is reasonable. However, I would prefer if metadata would be the same for all paths. Would it be possible to change the current implementation to always do the padding and then have the dimensions node on the side. It would simplify the general case of checking how many dimensions were specified, and cases that don't care can have a valid 3D value. Note that it might require changes to the SPIR-V translator.

I've not worked much with non-GPU targets, so please correct me if I'm wrong here, the reason for padding only for AMD/NVIDIA targets is that those two are the only ones that use the metadata mechanism: emit-program-metadata . And so my logic was that by not emitting reqd_work_group_size the workaround should not be present either.

jchlanda avatar May 06 '24 12:05 jchlanda

Regarding the naming, would this problem (and solution) not also apply to work_group_size_hint metadata? If so, could this metadata you're adding be made more general somehow? Maybe just trimming off the reqd_, or something else we can discuss down at the bike shed :)

I think you are right, the spec this about the hint:

The number of arguments must match the dimensionality of the work-group used to invoke the kernel, and the order of the arguments matches the order of the dimension extents to the range constructor.

I'm happy to rename it to num_dim_work_group_size, or any other thing that you'd like to suggest @frasercrmck . Naming is hard...

Yeah... work_group_num_dim, num_work_group_dim, num_dim_work_group (in rough priority order)... Not sure the size component is particularly important.

We've perhaps got a short window to rename it once it's merged but it'd be better to get it right first time, before downstream users start relying on the name!

Done, work_group_num_dim it is.

jchlanda avatar May 06 '24 12:05 jchlanda

I've not worked much with non-GPU targets, so please correct me if I'm wrong here, the reason for padding only for AMD/NVIDIA targets is that those two are the only ones that use the metadata mechanism: emit-program-metadata . And so my logic was that by not emitting reqd_work_group_size the workaround should not be present either.

Originally we always padded to fit with the existing semantics for the metadata. However, this meant that we lost information about the dimensionality, e.g. sycl::reqd_work_group_size(32) would be the same as sycl::reqd_work_group_size(1, 1, 32), while SYCL needs to know the dimensionality to properly diagnose invalid uses. At the time, the solution used was to make the metadata more flexible in a "less-is-more" approach, but I now see how that becomes problematic when trying to stay in line with upstream.

What I am now considering is whether we should go back on that approach and instead always do the padding (not only for HIP/CUDA) as well as always add the dimensionality information. That way, we do not need to keep making special handling for HIP/CUDA and we keep the semantics of the metadata object consistent again.

steffenlarsen avatar May 08 '24 07:05 steffenlarsen

What I am now considering is whether we should go back on that approach and instead always do the padding (not only for HIP/CUDA) as well as always add the dimensionality information. That way, we do not need to keep making special handling for HIP/CUDA and we keep the semantics of the metadata object consistent again.

@steffenlarsen I've aligned both spir-v and GPU paths now. Because we don't use module metadata for spir-v it required a switch to device requirements and an additional axis in the device module splitting logic.

Note that it might require changes to the SPIR-V translator.

Would you be able to elaborate on this, how would I spot the need? Would running spir-v tests with this branch be enough?

jchlanda avatar May 10 '24 13:05 jchlanda

The gen12 failures doesn't seem to be related to this PR.

jchlanda avatar May 15 '24 11:05 jchlanda

The gen12 failures doesn't seem to be related to this PR.

https://github.com/intel/llvm/pull/13503 needs to merge first to resolve those issues.

kbenzie avatar May 15 '24 13:05 kbenzie

The gen12 failures doesn't seem to be related to this PR.

#13503 needs to merge first to resolve those issues.

This has now been merged. Pull in the latest sycl branch changes to resolve the ASAN test failures.

kbenzie avatar May 16 '24 10:05 kbenzie

@intel/llvm-gatekeepers this is ready to go now, thank you.

jchlanda avatar May 21 '24 05:05 jchlanda