MaheshRavishankar

Results 155 comments of MaheshRavishankar

eventually it should run in place (this is done on the CPU side). Easiest path is to follow what is done for the CPU side here.

I dont know if convert to destination passing style is used on the vmvx path. I was going to check that (havent yet)

Thanks for looking into it. I think it might be the same issue as #10406 . Maybe this fixes it https://github.com/iree-org/iree/commit/51cab31caabf2512cb1c25400a084713e713ae22

@harsh-nod try patching this in https://reviews.llvm.org/D132355

To unblock,you can increase the limit here https://github.com/iree-org/iree/blob/ae72b956ea0701482fb95d170b40ed82e0e4ef46/compiler/src/iree/compiler/Codegen/LLVMCPU/LLVMCPUCheckIRBeforeLLVMConversion.cpp#L19 . Eventually to land this we should just avoid fusion pad with pooling op consumers.

> > Left some initial comments. Having thought about this for a few days, I think the overall direction is fine. Layering this after the loop is generate (as is...

Wanted some clarification on what your goal in the next few days are for this PR. Is this WIP or do you want to land this?

Based on discussion offline, I have a few questions about this approach based on more immediate things (leaving long-term things aside for now). One thing that was a next step...

> When `CyclicNumProcsEqNumIters` is used, the nested loops won't be generated and the bounded size is not recorded in `boundedSizesForLoops` and the specialization won't kick in. I didn't put the...

> I think I need more background of `CyclicNumProcsEqNumIters` and how it helps removing `workload_per_workgroup`. How is `RemoveTrivialLoops` related to `CyclicNumProcsEqNumIters`? Might be easier to explain on GVC. I am...