Han-Chung Wang
Han-Chung Wang
I'm using this issue as the main issue for tracking "bring up llama8b fp8 on mi350". @Abhishek-Varma can you help generate the metrics similar to [this](https://github.com/iree-org/iree/issues/21195#issuecomment-3249643367)? So we can see...
Thanks @Abhishek-Varma ! This is a good breakdown. Can you also add a column for e2e performance? Few questions: - I remember that there are no additional encoding dispatch. I.e.,...
> Listing down here the perf breakdown for non-data tiled vs data tiled compilation for llama 8b on gfx350. The IR has been obtained from [here](https://github.com/nod-ai/shark-ai/issues/2548#issuecomment-3444018705). > > No Data...
Closing the issue because we successfully brought up the model. Now the issue is about performance, and let's move the discussion to https://github.com/iree-org/iree/issues/21958 (I moved the last three comments to...
FYI, I'm considering to revamp https://github.com/iree-org/iree/pull/17530 for CPU backends. It is a more aggressive version that may flatten something like `tensor`, depends on the native vector size.
Have you tried removing all those files from [BUILD.bazel](https://github.com/llvm/torch-mlir/blob/main/utils/bazel/torch-mlir-overlay/BUILD.bazel)?