Muzammiluddin Syed issues

Results 11 issues of


                                            Muzammiluddin Syed

Improve SortOp canonicalization pattern to also drop unused results in buffer semantics

### Request description original issue: https://github.com/iree-org/iree/issues/20699 incomplete pr: https://github.com/iree-org/iree/pull/20827 ### What component(s) does this issue relate to? Compiler ### Additional context _No response_

enhancement ➕

codegen

onboarding/codegen

[Codegen][NVVM] Add lowering in NVVM for clustered subgroup reduction and do associated clean up

### Request description See: https://github.com/iree-org/iree/pull/20468#discussion_r2126945727 ### What component(s) does this issue relate to? _No response_ ### Additional context _No response_

enhancement ➕

Add support for GPUPrintfOp

### Request description There is a lowering of `GPUPrintfOp`s in [upstream LLVM](https://github.com/llvm/llvm-project/blob/46f90165be92e08e059dcc07d42347cbf7446a0b/mlir/lib/Conversion/GPUCommon/GPUOpsLowering.h#L143) and it would be helpful for quality of life and debugging, if we could also make use of...

enhancement ➕

[CodeGen][SPIRV] Lowering for clustered reduce not implemented

### What happened? # Context To make effective use of DPP operations available to AMD GPU's, the PR below changed the implementation of warp reduction to preserve `subgroup_reduce` ops rather...

bug 🐞

Initial commit does not build

Draft PR For initial review. Adding lowering support for RoiAlign ops with static shapes and known sampling ratios.

[DO NOT MERGE] Experimental branch for benchmarking FP4 Gemms

[LLVMGPU] Account for scales when picking schedule

Context: https://github.com/iree-org/iree/pull/22737#discussion_r2577897836 During the schedule selection process there are various places where we still do not adequately support scaled intrinsics. - The selection of the A.I cutoff points for gemms...

yolo

Muzammiluddin Syed

Improve SortOp canonicalization pattern to also drop unused results in buffer semantics

[Codegen][NVVM] Add lowering in NVVM for clustered subgroup reduction and do associated clean up

Add support for GPUPrintfOp

[CodeGen][SPIRV] Lowering for clustered reduce not implemented

Initial commit does not build

[DO NOT MERGE] Experimental branch for benchmarking FP4 Gemms

[LLVMGPU] Account for scales when picking schedule

[Cmake] Make better use of bazel to reduce the amount of hand-authored CMake code

[bindings] Add support for DenseI32ArrayAttr in `getIntArrayAttrValues`

[gfx950][mxfp4] Verify the state of current heuristics