Add support for GPUPrintfOp
Request description
There is a lowering of GPUPrintfOps in upstream LLVM and it would be helpful for quality of life and debugging, if we could also make use of this op to read intermediate values in IR.
What component(s) does this issue relate to?
No response
Additional context
No response
cc: @krzysz00
This won't be implemented as it is upstream, but the functionality is present for it - someone just needs to wire it up (and it's proven not to be useful enough for anyone to do that :). Adding a tensor output is usually sufficient.
The way to implement it in IREE is to lower it into a hal.instrument.print op, have support for that in the LLVMGPU backend as done with the CPU side, and use the runtime instrumentation flags to get access to it (which can be updated to stream to stdout/etc, but no one has ever used it so the rest of the UX is incomplete). See https://github.com/iree-org/iree/pull/12357.