iree [GPU] Thread through a common target description

This commit starts threading through a common target description for various GPU backends. The goal is to unify how we specify the target across CUDA, HIP, Vulkan, and others, so that we can be consistent and reuse the some facilities.

Concretely, we introduce a #iree_gpu.target attribute in the IREEGPUDialect to describe the GPU target. It includes a few fields:

The canonical target architecture for compilation, e.g., sm_80 for cuda, gfx942 for hip
A TargetCoreAttr describing the GPU features and limits in a single GPU core (that is, AMD compute unit or NVIDIA streaming multiprocessor), e.g., compute/storage capabilities, subgroup sizes, and workgroup size limits
An optional TargetChipAttr describing GPU features for the final chip or product, e.g., core count

To help writing tests, we also introduce another shorthand #iree_gpu.alias_target attribute which contains the target API and the target arch/chip/product. The target arch/chip/product must be known to the compiler.

In order to support common development targets and the shorthand attribute, we list some common AMD/NVIDIA GPU architectures and their capabilities and limits. This is in general fine for datacenter GPU use cases where we only ever care one architecture.

This starts with update both LLVMGPU backends (CUDA and ROCm) to use the new target description. Vulkan side is untouched and the next step to unify.

Progress towards https://github.com/iree-org/iree/issues/16341

Apr 29 '24 15:04 antiagainst

@benvanik: I opted in to use EnumAttr because they makes the C++ code and IR much nicer to work with. I see your points regarding extensibility. I think we can relax to avoid some restrictions brought by it by using normal strings instead. Though stringifying everything doesn't seem a good end result either; so a balance to strike. A good way to go might be adding an additional dictionary to allow plugins inject whatever additional features bits they want. The current ones defined in the TD files are just those we support for ourselves; and hopefully those cover major use cases, given that we have a super broad GPU API and vendor architecture support anyway.

For the target API specifically though, agreed that's overly restrictive. For other enums, I don't expect GPUs to differ that greatly in a sense and our support of various GPU API/vendor/architecture hopefully gets us "comprehensively", this one is entirely big switch.. I think it makes sense to turn it into a string in the abbreviated attribute, and drop it in the full target. It's not that load bearing actually given other bits actually describe the target; it's more informative there.

Does the above sounds reasonable to you? Happy to chat a bit more directly.

May 01 '24 00:05 antiagainst

Makes sense with the data type enums being ints if you think they will be extended infrequently (like, with new GPU generations) or are fine with adding enum values that out-of-tree users may have even if nothing in-tree uses them. String enums is the way upstream handles things like this even if a bit more annoying to use :( It does allow for entirely out-of-tree work to happen, though, which is nice for private implementations and version shifting (user wants to use a pinned IREE version and even if they land the new enum value upstream would have to then juggle patches/forks/etc). If for a particular enum most are checking it instead of creating it then you can have static isFoo(attr) methods that make the comparison strongly typed (similar to how symbol visibility works with isPublic/etc), or static const char* that can be used for comparisons (if (stringAttr.getValue() == Foo::Bar)). Mostly just wanted to call it out because it's a pain to change later on - if you think int is fine then go for it.

Removing the target API seems best. I'm curious why it's needed in any form, though, as it feels duplicative with our target mechanism. Worth a chat so I can understand more about why you want it and if what you want may already be something we can do with those mechanisms or that they can be extended to support for you. For example, TargetBackend's getDefaultExecutableTargets/getHostExecutableTargets takes a deviceID (your target API) as well as its configuration dictionary: https://github.com/iree-org/iree/blob/e4b5a9350aee7cfafefb440a7f01b3d376842d92/compiler/src/iree/compiler/Dialect/HAL/Target/TargetBackend.h#L90-L102 We really don't want to have anything separate from our TargetDevice/TargetBackend as it's very tricky to manage extensibility, multi-targeting (including host/target cross-compilation), etc. Having common utility functions that GPU-based TargetBackend impls can use may be useful, vs. trying to bake it into IR. On my TODO list is a HAL attr that does default target expansion (which our flags will then end up mapping into) and if you're mostly looking to use this for making tests be less verbose that may be something to use.

May 01 '24 00:05 benvanik

Removing the target API seems best. I'm curious why it's needed in any form, though, as it feels duplicative with our target mechanism.

makes sense. a bit more context why including target api: it makes the target more standalone. but I agree here that we can get the target api from the executable target so better to avoid duplication. So I sent out 60f07cc to drop it entirely.

(even a bit more background--the target api is somewhat loadbearing particularly for spirv--for example if going down metal/webgpu, we may need different "legalization" patterns/passes for them, e.g., for webgpu. not sure we can always put it in a separate pass entry point; sometimes we may need to have "switches" inside existing patterns or so.. a bit sad but software is never perfect i guess..)

On my TODO list is a HAL attr that does default target expansion (which our flags will then end up mapping into) and if you're mostly looking to use this for making tests be less verbose that may be something to use.

This is why i'm introducing the iree_gpu.alias_target in a sense, cause I don't want to duplicate multiple lines of full target description everywhere in the lit tests.

May 01 '24 03:05 antiagainst

@benvanik I changed to 1) add an extra dictionary attr to allow better flexibility for out of tree targets, and 2) changed to use a cl option --iree-gpu-test-target to provide the target for lit tests instead of the alias target, as we chatted before. Could you take another look? thanks.

May 20 '24 16:05 antiagainst

Oh sorry, I think I caused those conflicts because I had the same problem with the enum definitions. Do you want me to fix them?

May 21 '24 00:05 qedawkins

Oh sorry, I think I caused those conflicts because I had the same problem with the enum definitions. Do you want me to fix them?

Don't worry. I think I fixed them. :)

May 21 '24 00:05 antiagainst

ok nice :D

May 21 '24 00:05 qedawkins

Oh actually new ones.. Let me see. :D

May 21 '24 00:05 antiagainst