Han-Chung Wang comments

Results 336 comments of


                                            Han-Chung Wang

[LLVMCPU] Bad packing codegen with different `outer_dims_perm`

Putting a note here. I think the current plan is 1. Enable unpack ukernels 2. Learn performance gap 3. Plan out the work for unpack codegen.

Missing propagation for `unpack -> collapse_shape` to `collpase_shape -> unpack`.

It should be done in https://github.com/iree-org/iree/blob/main/compiler/src/iree/compiler/GlobalOptimization/DataLayoutPropagation.cpp

Missing propagation for `unpack -> collapse_shape` to `collpase_shape -> unpack`.

I don't know. It is here after set encoding. A sequence of linalg ops are raised to softmax op in GlobalOptimization stage. Are we able to push down reshape ops...

Missing f32->bf16 demotion support for the targets of data-tiling ops

Oh, this is mostly a quick experimental flag. Sometimes we want to conditionally select some ops (e.g., contraction ops) and demote their input operands from fp32 to bf16 types. It...

[LLVMCPU] Adjust tile sizes of unpack op

I forgot to drop initial comments. I think having big tile sizes (e.g., 16x16x16) is not a good idea. We need some plan to codegen it properly. @pashu123 is going...

Serialize Executables crashing when compiling LLaMa on async-cpu

> It appears the issue is in `LLVMCPUVectorTransferLowering`. There is a full unrolling making the dispatch rather unruly. The unrolling is needed because LLVM backend wants 1D vector. It could...

Serialize Executables crashing when compiling LLaMa on async-cpu

Okay, so this is similar to what I'm seeing in https://github.com/iree-org/iree/issues/17226#issuecomment-2087747095 IMO, we should not fuse these two generic ops. TileAndFuse is basically broken for the case. There are no...

Serialize Executables crashing when compiling LLaMa on async-cpu

@pashu123 please help take a look if there are other issues, apart from the fusion issue.

Serialize Executables crashing when compiling LLaMa on async-cpu

Perhaps you can try https://github.com/llvm/torch-mlir/pull/3277 . It should fix the embedding lookup issue at torch level.

Serialize Executables crashing when compiling LLaMa on async-cpu

> That gets further, yeah :D. Might be enough to call this particular issue fixed? There is an action item at LinAlg level: https://github.com/iree-org/iree/issues/17226#issuecomment-2093718610 > I do see another error...