Han-Chung Wang comments

Results 336 comments of


                                            Han-Chung Wang

[CPU] linalg.pack op is not fused in forall distribution

Here is the snippet that I've been playing with the code today. Just in case if it helps: https://gist.github.com/hanhanW/4678117dc4767c1568b758caf43e0063 Run: `iree-opt --iree-transform-dialect-interpreter -canonicalize -cse ~/repro.mlir`. There is a bug in...

[CPU] linalg.pack op is not fused in forall distribution

> There is a bug in transform.iree.fuse_consumer, and I'm preparing a fix. Here is the fix: https://github.com/iree-org/iree/pull/20869

[CPU] linalg.pack op is not fused in forall distribution

I promised to share my idea, but I failed. I think we have good observation on discord, so I'm mirroring the context to the issue. From @egebeysel : Hi folks!...

[CPU] linalg.pack op is not fused in forall distribution

## MapScatter Approach After talking to @MaheshRavishankar today (and maybe @Max191 in the other meeting recently), we have a more robust fix. It may not be performant, but it prevents...

[CPU] linalg.pack op is not fused in forall distribution

RE map_scattor approach, we'd need https://github.com/iree-org/iree/issues/21135 for tensor-based vectorization. It can happen later, as we need it being functional as the first step.

[CPU] linalg.pack op is not fused in forall distribution

> [@hanhanW](https://github.com/hanhanW) I'll be glad if you could ping me if/when you're working on this. I'll also be interested to work on it and/or contribute at one point but I...

[CPU] linalg.pack op is not fused in forall distribution

There are some issues that I need to triage, e.g., additional stack buffers. And there are other work like vectorization. But I verified that https://github.com/iree-org/iree/pull/21444 fixes the issue.

Padding failures after LLVM bump

I'll take a look when I have cycles. The pipeline is really not used at all, so it is on low priorities.

[CPU] `transpose -> pack` folding pattern inhibits fusion

There is a potential issue about memory footprint when doing the data-tiling. In your use case, it may increase the total memory usage if we end up with such dispatch....

[CPU] `transpose -> pack` folding pattern inhibits fusion

> Do you mean because the transpose op possibly has multiple encodings that each get hoisted/cloned into initializers and potentially get duplicates? No. What I meant is that the transpose...