Han-Chung Wang

Results 336 comments of Han-Chung Wang

I'm using this issue as the main issue for tracking "bring up llama8b fp8 on mi350". @Abhishek-Varma can you help generate the metrics similar to [this](https://github.com/iree-org/iree/issues/21195#issuecomment-3249643367)? So we can see...

Thanks @Abhishek-Varma ! This is a good breakdown. Can you also add a column for e2e performance? Few questions: - I remember that there are no additional encoding dispatch. I.e.,...

> Listing down here the perf breakdown for non-data tiled vs data tiled compilation for llama 8b on gfx350. The IR has been obtained from [here](https://github.com/nod-ai/shark-ai/issues/2548#issuecomment-3444018705). > > No Data...

Closing the issue because we successfully brought up the model. Now the issue is about performance, and let's move the discussion to https://github.com/iree-org/iree/issues/21958 (I moved the last three comments to...

FYI, I'm considering to revamp https://github.com/iree-org/iree/pull/17530 for CPU backends. It is a more aggressive version that may flatten something like `tensor`, depends on the native vector size.

Have you tried removing all those files from [BUILD.bazel](https://github.com/llvm/torch-mlir/blob/main/utils/bazel/torch-mlir-overlay/BUILD.bazel)?