Han-Chung Wang comments

Results 336 comments of


                                            Han-Chung Wang

[DT][gfx950] Bring up llama8b _fp8 on mi350

I'm using this issue as the main issue for tracking "bring up llama8b fp8 on mi350". @Abhishek-Varma can you help generate the metrics similar to [this](https://github.com/iree-org/iree/issues/21195#issuecomment-3249643367)? So we can see...

[DT][gfx950] Bring up llama8b _fp8 on mi350

Thanks @Abhishek-Varma ! This is a good breakdown. Can you also add a column for e2e performance? Few questions: - I remember that there are no additional encoding dispatch. I.e.,...

[DT][gfx950] Bring up llama8b _fp8 on mi350

> Listing down here the perf breakdown for non-data tiled vs data tiled compilation for llama 8b on gfx350. The IR has been obtained from [here](https://github.com/nod-ai/shark-ai/issues/2548#issuecomment-3444018705). > > No Data...

[DT][gfx950] Bring up llama8b _fp8 on mi350

Closing the issue because we successfully brought up the model. Now the issue is about performance, and let's move the discussion to https://github.com/iree-org/iree/issues/21958 (I moved the last three comments to...

[Codegen][LLVMGPU] Add a DropVectorUnitDims call

FYI, I'm considering to revamp https://github.com/iree-org/iree/pull/17530 for CPU backends. It is a more aggressive version that may flatten something like `tensor`, depends on the native vector size.

ERROR: missing input file '@@+_repo_rules3+torch-mlir//:include/torch-mlir-dialects/Dialect/TMTensor/Transforms/PassDetail.h

Have you tried removing all those files from [BUILD.bazel](https://github.com/llvm/torch-mlir/blob/main/utils/bazel/torch-mlir-overlay/BUILD.bazel)?