iree
iree copied to clipboard
[spirv] Add lowering configuration verification logic
Specifying lowering configuration in the original source module was enabled via https://github.com/iree-org/iree/pull/10430, which allows external searching/tuning. We'd need to add parameters verification to make sure external numbers are consistent. This should be doing something similar to the LLVM pipelines, e.g., https://github.com/iree-org/iree/pull/10147. We can refer to both LLVMCPULowerExecutableTarget.cpp and LLVMGPULowerExecutableTarget.cpp for details.
Workgroup size [x, y, z]
constraints:
-
x * y * z
should not exceedmax_compute_workgrup_invocations
of the target architecture (fromspv.target_env
). - Each dimension should not exceed the corresponding limit in
max_compute_workgroup_size
formspv.target_env
- Each dimension should be some power of two.
-
x * y * z
should be be a multiple of thesubgroup_size
fromspv.target_env
.
Shared memory usage constraints:
- Shared memory used should not exceed
max_compute_shared_memory_size
(in bytes) fromspv.target_env
.
If we have #10499 done, the constraints for matmul/convolution ops are:
(batch) matmul [(workgroup_tile_b,) workgroup_tile_m, workgroup_tile_n, thread_tile_k]
constraints:
- the input matmul's B/M/N/K dimension size should be a multiplier of
workgroup_tile_{b|m|n|k}
. -
workgroup_tile_{b|m|n}
should be a multiplier of workgroup size z/y/x. -
thread_tile_{k|n}
should preferably be some power of two.
2-D NHWC convolution tile sizes [wg_tile_oh, wg_tile_ow, t_tile_oc, t_tile_fh, t_tile_ic]
constraints:
- the input convolution's OH/OW/OC dimension size should be a multiplier of
wg_tile_{oh|ow|oc}
. -
wg_tile_{oh|ow|oc}
should be a multiplier of workgroup size z/y/x. -
t_tile_{fh|fw}
should be 1. -
t_tile_ic
should preferably be some power of two.
There are other variants of convolution, but they should be similar to the above.
Thanks for the details of the constraints. I have some questions.
- How can I get
max_compute_workgrup_invocations
,max_compute_workgroup_size
andspv.target_env
? Is there any document/codes related to this? - Is the third level of tile sizes [0, 0, 0, 0, 1, 1, 4] fixed? So t_tile_ic should be equal to 4?
For 1. you can use getSPIRVTargetEnvAttr()
(defined here). You can search the codebase to see how it should be used.
For 2. IC is actually tunable. But being 4 is probably the best. FH/FW should always be 1, so fixed yes.
Hi @yzhang93 I assume you will be working on this; so I'll assign this issue to you for now. :)
@antiagainst Sure, just start working on it. I have another question. How to get shared memory usage for matmul and conv?
@antiagainst Sure, just start working on it. I have another question. How to get shared memory usage for matmul and conv?
You can follow the logic on CUDA side here.
@antiagainst Sure, just start working on it. I have another question. How to get shared memory usage for matmul and conv?
You can follow the logic on CUDA side here.
Thanks, I got the idea how to get the shared memory for matmul. Just want to make sure the calculation for conv.
No shared memory are used for conv right now; so we don't need to verify that.
@antiagainst Okay, got it. Thanks!