Han-Chung Wang comments

Results 336 comments of


                                            Han-Chung Wang

[GPU] Vector shape does not match the layout

Here is the input IR: ```mlir hal.executable public @conv_2d_nchw_fchw_dispatch_1 { hal.executable.variant public @rocm_hsaco_fb target() { hal.executable.export public @conv_2d_nchw_fchw_dispatch_1_batch_matmul_64x968x4x320_f16xf16xf32 ordinal(0) layout(#hal.pipeline.layout) attributes {hal.interface.bindings = [#hal.interface.binding, #hal.interface.binding, #hal.interface.binding]} { ^bb0(%arg0: !hal.device): %x,...

[GPU] Vector shape does not match the layout

closing the issue because there are no action items.

[EPIC][CPU] Enable direct pack/unpack vectorization in IREE

The two prototypes I have for pack/unpack shape inference are https://github.com/openxla/iree/pull/16629 and https://github.com/openxla/iree/pull/16664

RHS tensor.pack op codegen generates inefficient store

note: the flatten is needed for LHS packing as well

RHS tensor.pack op codegen generates inefficient store

I will use below three cases to drive the optimization work. ```mlir func.func @pack_i8(%source: tensor

RHS tensor.pack op codegen generates inefficient store

> Didn't we already have a pass to make the innermost dimension larger? Yes, we have. The patterns will make the innermost dimension as larger as possible, i.e., it flattens...

RHS tensor.pack op codegen generates inefficient store

https://github.com/iree-org/iree/pull/16456 should address the issue. I'll revisit how to land the PR

[EPIC][CPU] Enable predictable performance on mixed-types GEMM using data-tiling

@Max191 I think you have some local patches and ideas that are required for mixed-types data-tiling work, could you add them to tasklist accordingly? @bjacob please help update this if...

[EPIC][CPU] Enable predictable performance on mixed-types GEMM using data-tiling

For small tasks, adding a brief description to tasklist is good enough. For large tasks, it would be good if you can create an issue/epic. It's not necessary to do...

[LLVMCPU] Bad packing codegen with different `outer_dims_perm`

The pack issue is temporarily covered by ukernel. Let's focus on unpack kernel in the issue. There are transpose variants in the unpack op. We need to take it into...