MaheshRavishankar comments

Results 155 comments of


                                            MaheshRavishankar

IREEComprehensiveBufferize introducing extraneous copies/allocs

eventually it should run in place (this is done on the CPU side). Easiest path is to follow what is done for the CPU side here.

IREEComprehensiveBufferize introducing extraneous copies/allocs

I dont know if convert to destination passing style is used on the vmvx path. I was going to check that (havent yet)

IREEComprehensiveBufferize introducing extraneous copies/allocs

Thanks for looking into it. I think it might be the same issue as #10406 . Maybe this fixes it https://github.com/iree-org/iree/commit/51cab31caabf2512cb1c25400a084713e713ae22

Fusing tensor.pad with consumer for LLVMCPU pipeline

@harsh-nod try patching this in https://reviews.llvm.org/D132355

Fusing tensor.pad with consumer for LLVMCPU pipeline

To unblock,you can increase the limit here https://github.com/iree-org/iree/blob/ae72b956ea0701482fb95d170b40ed82e0e4ef46/compiler/src/iree/compiler/Codegen/LLVMCPU/LLVMCPUCheckIRBeforeLLVMConversion.cpp#L19 . Eventually to land this we should just avoid fusion pad with pooling op consumers.

Specialize workgroup distribution

> > Left some initial comments. Having thought about this for a few days, I think the overall direction is fine. Layering this after the loop is generate (as is...

Specialize workgroup distribution

Wanted some clarification on what your goal in the next few days are for this PR. Is this WIP or do you want to land this?

Specialize workgroup distribution

Based on discussion offline, I have a few questions about this approach based on more immediate things (leaving long-term things aside for now). One thing that was a next step...

Specialize workgroup distribution

> When `CyclicNumProcsEqNumIters` is used, the nested loops won't be generated and the bounded size is not recorded in `boundedSizesForLoops` and the specialization won't kick in. I didn't put the...

Specialize workgroup distribution

> I think I need more background of `CyclicNumProcsEqNumIters` and how it helps removing `workload_per_workgroup`. How is `RemoveTrivialLoops` related to `CyclicNumProcsEqNumIters`? Might be easier to explain on GVC. I am...