feat: use MLIR resources to capture large constants in the IR
https://github.com/google/heir/pull/1501 adds short-term support for serializing large constants at code emission time for the openfhe backend. But to support this in a more agnostic way and throughout the compilation process, it would be a good idea to use MLIR dialect resource blobs to handle large constants that slow compilation times.
@ZenithalHourlyRate laid out a good way to support this earlier in the pipeline (rather than just at one backend's code emission)
- [ ] A pass to transform arith.constant dense<blob_string> into arith.constant dense_resource<resource_key> and a trailing resource section in the file. And optionally split the IR and resource into two file, because MLIR allows a dense_resource
without the referenced value in the IR. - [ ] linalg-to-tensor-ext is provided with the resource file to operate. Some passes may also fold or canonicalize tensor operations (e.g. loop unroll?)
- [ ] Code emission to a particular backend on the split resource file should be done by other tools, saying
mlirResourceToCereal/mlirResourceToMlirByteCode - [ ] We can support importing constants in a
OnnxToMlirResource-ish way.
I wanted to revisit this since torch-mlir actually outputs dense resources for the large constants in the IR.
For large models, we know we may want to split the model weight processing into a separate function. I'm still not sure what format we want to use to serialize the weights to disk though. For the torch C++ examples, we used the torch library and the user would load serialized torch tensors.