[GPU] Thread through a common target description
This commit starts threading through a common target description for various GPU backends. The goal is to unify how we specify the target across CUDA, HIP, Vulkan, and others, so that we can be consistent and reuse the some facilities.
Concretely, we introduce a #iree_gpu.target attribute in the
IREEGPUDialect to describe the GPU target. It includes a few
fields:
- The canonical target architecture for compilation, e.g., sm_80 for cuda, gfx942 for hip
- A TargetCoreAttr describing the GPU features and limits in a single GPU core (that is, AMD compute unit or NVIDIA streaming multiprocessor), e.g., compute/storage capabilities, subgroup sizes, and workgroup size limits
- An optional TargetChipAttr describing GPU features for the final chip or product, e.g., core count
To help writing tests, we also introduce another shorthand
#iree_gpu.alias_target attribute which contains the
target API and the target arch/chip/product. The target
arch/chip/product must be known to the compiler.
In order to support common development targets and the shorthand attribute, we list some common AMD/NVIDIA GPU architectures and their capabilities and limits. This is in general fine for datacenter GPU use cases where we only ever care one architecture.
This starts with update both LLVMGPU backends (CUDA and ROCm) to use the new target description. Vulkan side is untouched and the next step to unify.
Progress towards https://github.com/iree-org/iree/issues/16341
@benvanik: I opted in to use EnumAttr because they makes the C++ code and IR much nicer to work with. I see your points regarding extensibility. I think we can relax to avoid some restrictions brought by it by using normal strings instead. Though stringifying everything doesn't seem a good end result either; so a balance to strike. A good way to go might be adding an additional dictionary to allow plugins inject whatever additional features bits they want. The current ones defined in the TD files are just those we support for ourselves; and hopefully those cover major use cases, given that we have a super broad GPU API and vendor architecture support anyway.
For the target API specifically though, agreed that's overly restrictive. For other enums, I don't expect GPUs to differ that greatly in a sense and our support of various GPU API/vendor/architecture hopefully gets us "comprehensively", this one is entirely big switch.. I think it makes sense to turn it into a string in the abbreviated attribute, and drop it in the full target. It's not that load bearing actually given other bits actually describe the target; it's more informative there.
Does the above sounds reasonable to you? Happy to chat a bit more directly.
Makes sense with the data type enums being ints if you think they will be extended infrequently (like, with new GPU generations) or are fine with adding enum values that out-of-tree users may have even if nothing in-tree uses them. String enums is the way upstream handles things like this even if a bit more annoying to use :( It does allow for entirely out-of-tree work to happen, though, which is nice for private implementations and version shifting (user wants to use a pinned IREE version and even if they land the new enum value upstream would have to then juggle patches/forks/etc). If for a particular enum most are checking it instead of creating it then you can have static isFoo(attr) methods that make the comparison strongly typed (similar to how symbol visibility works with isPublic/etc), or static const char* that can be used for comparisons (if (stringAttr.getValue() == Foo::Bar)). Mostly just wanted to call it out because it's a pain to change later on - if you think int is fine then go for it.
Removing the target API seems best. I'm curious why it's needed in any form, though, as it feels duplicative with our target mechanism. Worth a chat so I can understand more about why you want it and if what you want may already be something we can do with those mechanisms or that they can be extended to support for you. For example, TargetBackend's getDefaultExecutableTargets/getHostExecutableTargets takes a deviceID (your target API) as well as its configuration dictionary: https://github.com/iree-org/iree/blob/e4b5a9350aee7cfafefb440a7f01b3d376842d92/compiler/src/iree/compiler/Dialect/HAL/Target/TargetBackend.h#L90-L102 We really don't want to have anything separate from our TargetDevice/TargetBackend as it's very tricky to manage extensibility, multi-targeting (including host/target cross-compilation), etc. Having common utility functions that GPU-based TargetBackend impls can use may be useful, vs. trying to bake it into IR. On my TODO list is a HAL attr that does default target expansion (which our flags will then end up mapping into) and if you're mostly looking to use this for making tests be less verbose that may be something to use.
Removing the target API seems best. I'm curious why it's needed in any form, though, as it feels duplicative with our target mechanism.
makes sense. a bit more context why including target api: it makes the target more standalone. but I agree here that we can get the target api from the executable target so better to avoid duplication. So I sent out 60f07cc to drop it entirely.
(even a bit more background--the target api is somewhat loadbearing particularly for spirv--for example if going down metal/webgpu, we may need different "legalization" patterns/passes for them, e.g., for webgpu. not sure we can always put it in a separate pass entry point; sometimes we may need to have "switches" inside existing patterns or so.. a bit sad but software is never perfect i guess..)
On my TODO list is a HAL attr that does default target expansion (which our flags will then end up mapping into) and if you're mostly looking to use this for making tests be less verbose that may be something to use.
This is why i'm introducing the iree_gpu.alias_target in a sense, cause I don't want to duplicate multiple lines of full target description everywhere in the lit tests.
@benvanik I changed to 1) add an extra dictionary attr to allow better flexibility for out of tree targets, and 2) changed to use a cl option --iree-gpu-test-target to provide the target for lit tests instead of the alias target, as we chatted before. Could you take another look? thanks.
Oh sorry, I think I caused those conflicts because I had the same problem with the enum definitions. Do you want me to fix them?
Oh sorry, I think I caused those conflicts because I had the same problem with the enum definitions. Do you want me to fix them?
Don't worry. I think I fixed them. :)
ok nice :D
Oh actually new ones.. Let me see. :D