torch-mlir [DO NOT MERGE] Prototype for external large weights.

This lowers them into ml_program.global

Example use like this:

import torch
import torch_mlir
import torchvision

resnet18 = torchvision.models.resnet18(pretrained=True)
resnet18.eval()

example_inputs = torch.randn(1, 2)
mlir_module = torch_mlir.compile(
    resnet18, torch.ones(1, 3, 224, 224), output_type="torch",
    use_external_references_if_numel_exceeds=1)
print(mlir_module)

Backend contract IR: https://gist.github.com/silvasean/aba4b3b01b5a5649f1216dc6dd6942d4 Linalg-on-tensors IR: https://gist.github.com/silvasean/5ed3b6de238995e232ef3e2cb5a9f609

Sep 22 '22 16:09 silvasean

@powderluv here's something that should allow you to move forward with large weights. Still lots to do downstream from Torch-MLIR though (IREE etc.)

Sep 22 '22 16:09 silvasean

Hi @silvasean @jpienaar @powderluv - I tried using Converting to enable resetting variable and was able to use it for swapping using IREE.

Since it was based off this PR I'm initiating a discussion here.

I see that the difference with this PR and the one available as a diff in the above link is :- a. This PR ends up creating extern global variables, with mutable parameter set as False, etc and we don't see the values of the weight resources in the IR (the goal of the PR) nor does any extern file as such gets generated. b. The diff (above link) of @jpienaar 's implementation uses mutable=True resources and the weight resources find their place within the MLIR file.

Since Jacques also mentioned couple of points for improvement - I believe they can perhaps be taken as iterative improvements on the diff - i.e, ultimately shifting to the larger goal of COMPLETELY moving out the weight resources off the MLIR file.

Can we get this PR to merge in AND still work along with IREE generate accessors ? I guess the mutable parameter being set False here might will be an issue. OR
Can we aim to get the diff's implementation "merge" ready?

Dec 27 '22 15:12 Abhishek-Varma

I think there'll be a few iterations. My gist is "wrong" in that it marks all constants as mutable memory. Embedding the weights in the MLIR file generated is to get around lack of linkage specification, while lack of using a resource (which would be mmap'd and lead to much smaller files) was due to missing C API (resolved now upstream but don't know if integrated yet). There is the additional part of just referencing the file directly, for that River and I've talked about having a resource with file linkage information (not yet implemented, but the direct implementation about 200 LoC given Rivers recent change, there are some optimizations I'd like to do there though but to get baseline working).

For here I think it can be done in 3 parts without being too invasive:

One needs a way to specify which values to make global memory (size based is OK but I think doesn't really handle making them mutable);
One needs to convert those that need to be mutable to mutable (this is something not generally true of constants and one can only mark subset of the above as mutable, so flagging in 1 is required and then different selection in 2);
File linkage to skip materializing - this one I think is least important at the moment, you end up with a default model whose weights you can switch out, yes it encodes all the default weights (well one can make those splat constants easily too and then include a separate file ... Actually I'm not even sure why they can't be left alone until first set and so we need to have them specified, but this does mean that one gets a nice self-contained default file and with resource encoding vs dense attribute encoding there is good size improvement. BUT it's pretty much the same today's constants with dense elements are, hence why the others are more interesting).

Now with torch dynamo all weights becomes function args (a la TF/XLA style), so then one explicitly pass them in. Conceptually the above is just moving those weight params that one would have with torch dynamo into different setters and then you have loads while in dynamo case you have extra args and stitching values through call sites (the stitching is done for you, so no extra work needed).

So there are shorter term paths and longer term here. Most of the shorter term ones aren't terrible. I think the user interface is probably part that needs most TLC (perhaps just list of tensor constant values to make into globals), the rest one can have passes for (e.g., MarkVariableMutable or some such which takes list of globals and modifies const load to load) that runs before IREE. I'd start with something like this pass with constants encoded as init values and then iterate to final state. Given opt-in, nobody would be paying for complexity that doesn't use it and once the parts land one can again consider end-user UI and seeing if special cases can be reduced.

Dec 27 '22 16:12 jpienaar

I think the longer-term path here is to use TorchDynamo which eliminates the need for this patch altogether except for AoT use cases. The AoT use cases are a more complex design space and I'd like to see the actual use cases that are not served by the Dynamo path.

Any shorter term solutions (than Dynamo) will need to be guided by specific use cases and constraints.

Jan 03 '23 11:01 silvasean

Closed in favor of https://github.com/llvm/torch-mlir/pull/1793

Jan 11 '23 16:01 silvasean

torch-mlir torch-mlir copied to clipboard

[DO NOT MERGE] Prototype for external large weights.

torch-mlir
torch-mlir copied to clipboard