iree icon indicating copy to clipboard operation
iree copied to clipboard

[spirv] Add lowering configuration verification logic

Open antiagainst opened this issue 2 years ago • 4 comments

Specifying lowering configuration in the original source module was enabled via https://github.com/iree-org/iree/pull/10430, which allows external searching/tuning. We'd need to add parameters verification to make sure external numbers are consistent. This should be doing something similar to the LLVM pipelines, e.g., https://github.com/iree-org/iree/pull/10147. We can refer to both LLVMCPULowerExecutableTarget.cpp and LLVMGPULowerExecutableTarget.cpp for details.

antiagainst avatar Sep 21 '22 20:09 antiagainst

Workgroup size [x, y, z] constraints:

  • x * y * z should not exceed max_compute_workgrup_invocations of the target architecture (from spv.target_env).
  • Each dimension should not exceed the corresponding limit in max_compute_workgroup_size form spv.target_env
  • Each dimension should be some power of two.
  • x * y * z should be be a multiple of the subgroup_size from spv.target_env.

Shared memory usage constraints:

  • Shared memory used should not exceed max_compute_shared_memory_size (in bytes) from spv.target_env.

If we have #10499 done, the constraints for matmul/convolution ops are:

(batch) matmul [(workgroup_tile_b,) workgroup_tile_m, workgroup_tile_n, thread_tile_k] constraints:

  • the input matmul's B/M/N/K dimension size should be a multiplier of workgroup_tile_{b|m|n|k}.
  • workgroup_tile_{b|m|n} should be a multiplier of workgroup size z/y/x.
  • thread_tile_{k|n} should preferably be some power of two.

2-D NHWC convolution tile sizes [wg_tile_oh, wg_tile_ow, t_tile_oc, t_tile_fh, t_tile_ic] constraints:

  • the input convolution's OH/OW/OC dimension size should be a multiplier of wg_tile_{oh|ow|oc}.
  • wg_tile_{oh|ow|oc} should be a multiplier of workgroup size z/y/x.
  • t_tile_{fh|fw} should be 1.
  • t_tile_ic should preferably be some power of two.

There are other variants of convolution, but they should be similar to the above.

antiagainst avatar Sep 21 '22 22:09 antiagainst

Thanks for the details of the constraints. I have some questions.

  1. How can I get max_compute_workgrup_invocations, max_compute_workgroup_size and spv.target_env? Is there any document/codes related to this?
  2. Is the third level of tile sizes [0, 0, 0, 0, 1, 1, 4] fixed? So t_tile_ic should be equal to 4?

yzhang93 avatar Sep 22 '22 02:09 yzhang93

For 1. you can use getSPIRVTargetEnvAttr() (defined here). You can search the codebase to see how it should be used. For 2. IC is actually tunable. But being 4 is probably the best. FH/FW should always be 1, so fixed yes.

antiagainst avatar Sep 22 '22 22:09 antiagainst

Hi @yzhang93 I assume you will be working on this; so I'll assign this issue to you for now. :)

antiagainst avatar Sep 23 '22 00:09 antiagainst

@antiagainst Sure, just start working on it. I have another question. How to get shared memory usage for matmul and conv?

yzhang93 avatar Sep 27 '22 21:09 yzhang93

@antiagainst Sure, just start working on it. I have another question. How to get shared memory usage for matmul and conv?

You can follow the logic on CUDA side here.

antiagainst avatar Sep 27 '22 22:09 antiagainst

@antiagainst Sure, just start working on it. I have another question. How to get shared memory usage for matmul and conv?

You can follow the logic on CUDA side here.

Thanks, I got the idea how to get the shared memory for matmul. Just want to make sure the calculation for conv.

yzhang93 avatar Sep 27 '22 22:09 yzhang93

No shared memory are used for conv right now; so we don't need to verify that.

antiagainst avatar Sep 27 '22 22:09 antiagainst

@antiagainst Okay, got it. Thanks!

yzhang93 avatar Sep 27 '22 22:09 yzhang93