GPU data tiling: query the target's list of MMA intrinsics. Add FP8 test.
The current code had its own list of MFMA intrinsics that we can use, then checked that against the target. Flipping this around, we can simply query the list from the target.
The only subtlety is that the target may support multiple intrinsics for a given combination of element types, in which case we have to choose one.
This PR also changes std::optional<Attr> to just Attr since a default-constructed Attr is null-ish, there is no need for a second null-value.
The heuristic added in this PR is designed to match the existing choices so that the tests don't need to change; these existing choices are also what maximizes some microbenchmark performance, but we have known that they may be counterproductive in real scenarios where the bottleneck is power.
The test gains a FP8 testcase, and some renaming to simplify function names (which had become a lie in some testcases).