byteir
byteir copied to clipboard
[torch-frontend] add stablehlo IRs for Mixtral model.
In this PR, we provide stablehlo IR of a single Mixtral decoder layer using ByteIR stack. The IR is elided by --mlir-elide-resource-strings-if-larger=1000 option, so not all dialect resources storing the model weights are displayed in the IR.
Note: we have some local patches to make the compilation succeed.
- We eliminate torch.runtime.assert in stablehlo conversion, as we haven't decided how to handle it.
- We need patches of PR 3322 and PR 3085 in torch-mlir
Update at 2024.05.31.
We add stablehlo IR of a whole Mixtral 8x7B model. Note, to save compilation time and memory consumption, we convert the large weights into splat DenseElementsAttrs. See frontends/torch-frontend/examples/inference/mixtral/infer_single_mixtral.py for how to run.