Han-Chung Wang
Han-Chung Wang
Putting a note here. I think the current plan is 1. Enable unpack ukernels 2. Learn performance gap 3. Plan out the work for unpack codegen.
It should be done in https://github.com/iree-org/iree/blob/main/compiler/src/iree/compiler/GlobalOptimization/DataLayoutPropagation.cpp
I don't know. It is here after set encoding. A sequence of linalg ops are raised to softmax op in GlobalOptimization stage. Are we able to push down reshape ops...
Oh, this is mostly a quick experimental flag. Sometimes we want to conditionally select some ops (e.g., contraction ops) and demote their input operands from fp32 to bf16 types. It...
I forgot to drop initial comments. I think having big tile sizes (e.g., 16x16x16) is not a good idea. We need some plan to codegen it properly. @pashu123 is going...
> It appears the issue is in `LLVMCPUVectorTransferLowering`. There is a full unrolling making the dispatch rather unruly. The unrolling is needed because LLVM backend wants 1D vector. It could...
Okay, so this is similar to what I'm seeing in https://github.com/iree-org/iree/issues/17226#issuecomment-2087747095 IMO, we should not fuse these two generic ops. TileAndFuse is basically broken for the case. There are no...
@pashu123 please help take a look if there are other issues, apart from the fusion issue.
Perhaps you can try https://github.com/llvm/torch-mlir/pull/3277 . It should fix the embedding lookup issue at torch level.
> That gets further, yeah :D. Might be enough to call this particular issue fixed? There is an action item at LinAlg level: https://github.com/iree-org/iree/issues/17226#issuecomment-2093718610 > I do see another error...