[torch-frontend] add stablehlo IRs for Mixtral model.

Open Vremold opened this issue 1 year ago • 1 comments

In this PR, we provide stablehlo IR of a single Mixtral decoder layer using ByteIR stack. The IR is elided by --mlir-elide-resource-strings-if-larger=1000 option, so not all dialect resources storing the model weights are displayed in the IR.

Note: we have some local patches to make the compilation succeed.

We eliminate torch.runtime.assert in stablehlo conversion, as we haven't decided how to handle it.
We need patches of PR 3322 and PR 3085 in torch-mlir

May 16 '24 17:05 Vremold

Update at 2024.05.31.

We add stablehlo IR of a whole Mixtral 8x7B model. Note, to save compilation time and memory consumption, we convert the large weights into splat DenseElementsAttrs. See frontends/torch-frontend/examples/inference/mixtral/infer_single_mixtral.py for how to run.

May 30 '24 16:05 Vremold