torch-mlir
torch-mlir copied to clipboard
[DO NOT MERGE] Prototype for external large weights.
This lowers them into ml_program.global
Example use like this:
import torch
import torch_mlir
import torchvision
resnet18 = torchvision.models.resnet18(pretrained=True)
resnet18.eval()
example_inputs = torch.randn(1, 2)
mlir_module = torch_mlir.compile(
resnet18, torch.ones(1, 3, 224, 224), output_type="torch",
use_external_references_if_numel_exceeds=1)
print(mlir_module)
Backend contract IR: https://gist.github.com/silvasean/aba4b3b01b5a5649f1216dc6dd6942d4 Linalg-on-tensors IR: https://gist.github.com/silvasean/5ed3b6de238995e232ef3e2cb5a9f609
@powderluv here's something that should allow you to move forward with large weights. Still lots to do downstream from Torch-MLIR though (IREE etc.)
Hi @silvasean @jpienaar @powderluv - I tried using Converting to enable resetting variable and was able to use it for swapping using IREE.
Since it was based off this PR I'm initiating a discussion here.
I see that the difference with this PR and the one available as a diff in the above link is :-
a. This PR ends up creating extern
global variables, with mutable
parameter set as False
, etc and we don't see the values of the weight resources in the IR (the goal of the PR) nor does any extern file as such gets generated.
b. The diff (above link) of @jpienaar 's implementation uses mutable=True
resources and the weight resources find their place within the MLIR file.
Since Jacques also mentioned couple of points for improvement - I believe they can perhaps be taken as iterative improvements on the diff - i.e, ultimately shifting to the larger goal of COMPLETELY moving out the weight resources off the MLIR file.
- Can we get this PR to merge in AND still work along with IREE generate accessors ? I guess the
mutable
parameter being setFalse
here might will be an issue. OR - Can we aim to get the diff's implementation "merge" ready?
I think there'll be a few iterations. My gist is "wrong" in that it marks all constants as mutable memory. Embedding the weights in the MLIR file generated is to get around lack of linkage specification, while lack of using a resource (which would be mmap'd and lead to much smaller files) was due to missing C API (resolved now upstream but don't know if integrated yet). There is the additional part of just referencing the file directly, for that River and I've talked about having a resource with file linkage information (not yet implemented, but the direct implementation about 200 LoC given Rivers recent change, there are some optimizations I'd like to do there though but to get baseline working).
For here I think it can be done in 3 parts without being too invasive:
- One needs a way to specify which values to make global memory (size based is OK but I think doesn't really handle making them mutable);
- One needs to convert those that need to be mutable to mutable (this is something not generally true of constants and one can only mark subset of the above as mutable, so flagging in 1 is required and then different selection in 2);
- File linkage to skip materializing - this one I think is least important at the moment, you end up with a default model whose weights you can switch out, yes it encodes all the default weights (well one can make those splat constants easily too and then include a separate file ... Actually I'm not even sure why they can't be left alone until first set and so we need to have them specified, but this does mean that one gets a nice self-contained default file and with resource encoding vs dense attribute encoding there is good size improvement. BUT it's pretty much the same today's constants with dense elements are, hence why the others are more interesting).
Now with torch dynamo all weights becomes function args (a la TF/XLA style), so then one explicitly pass them in. Conceptually the above is just moving those weight params that one would have with torch dynamo into different setters and then you have loads while in dynamo case you have extra args and stitching values through call sites (the stitching is done for you, so no extra work needed).
So there are shorter term paths and longer term here. Most of the shorter term ones aren't terrible. I think the user interface is probably part that needs most TLC (perhaps just list of tensor constant values to make into globals), the rest one can have passes for (e.g., MarkVariableMutable or some such which takes list of globals and modifies const load to load) that runs before IREE. I'd start with something like this pass with constants encoded as init values and then iterate to final state. Given opt-in, nobody would be paying for complexity that doesn't use it and once the parts land one can again consider end-user UI and seeing if special cases can be reduced.
I think the longer-term path here is to use TorchDynamo which eliminates the need for this patch altogether except for AoT use cases. The AoT use cases are a more complex design space and I'd like to see the actual use cases that are not served by the Dynamo path.
Any shorter term solutions (than Dynamo) will need to be guided by specific use cases and constraints.
Closed in favor of https://github.com/llvm/torch-mlir/pull/1793