torch-mlir icon indicating copy to clipboard operation
torch-mlir copied to clipboard

[TRACKER] V-Diffusion-PyTorch Model

Open vivekkhandelwal1 opened this issue 3 years ago • 7 comments

Hi

I am working on providing the support for v-diffusion-pytorch model via torch-mlir. As of now, I am able to get the simplified version of the model compiled successfully via torch-mlir. But the complete version of the model is not getting compiled. It results in the following error:

Traceback (most recent call last):
  File "v_diffusion_with_sampling.py", line 127, in <module>
    module = torch_mlir.compile(
  File "/home/vivek/work/02_07/vivekkhandelwal1-SHARK/shark.venv/lib/python3.8/site-packages/torch_mlir/__init__.py", line 212, in compile
    mb.import_module(scripted._c, class_annotator, import_options)
ValueError: Unhandled tensor that shares storage with another tensor.
Found at path '<root>._tensor_constant0' from root object '__torch__.torch.fx.graph_module.model_inference'

The model is available at https://github.com/nod-ai/SHARK/tree/main/tank/pytorch/v_diffusion. To test or work on the model, please follow the installation instructions available at: https://github.com/nod-ai/SHARK/tree/main/tank/pytorch/v_diffusion#installation

To run the PyTorch version of the model: 1.) Run the following command: ./v-diffusion-pytorch/cfg_sample.py "the rise of consciousness":5 -n 5 -bs 5 --seed 0

To compile the simplified version of the model via torch-mlir: 1.) Run the script v_diffusion.py, with the following command: python v_diffusion.py 2> v_diffusion_ir.mlir

To reproduce the error: 1.) Run the script v_diffusion_with_sampling.py, with the following command: python v_diffusion_with_sampling.py

vivekkhandelwal1 avatar Aug 09 '22 08:08 vivekkhandelwal1

The above error happens to be because of this line of code: https://github.com/crowsonkb/v-diffusion-pytorch/blob/master/diffusion/models/cc12m_1.py#L91. If I comment this part of the code and return a dummy tensor, it results in a Segmentation Fault.

vivekkhandelwal1 avatar Aug 09 '22 16:08 vivekkhandelwal1

@silvasean Your suggestions are welcome!

vivekkhandelwal1 avatar Aug 09 '22 16:08 vivekkhandelwal1

This is due to having two tensors in the program that share storage, but aren't the exact same tensor (e.g. they are two different views of the same underlying tensor) You can make the logic here smarter, but could get quite complicated: https://github.com/llvm/torch-mlir/blob/e322f6a8784009b37aa354abfa9a40a80f30877d/python/torch_mlir/dialects/torch/importer/jit_ir/csrc/ivalue_importer.cpp#L224 we have a minimal test here https://github.com/llvm/torch-mlir/blob/main/test/python/importer/jit_ir/ivalue_import/object-identity-error.py#L17 I would recommend first identifying the two tensors that share storage and see how theyare related. Is one a view of the other? Etc. Based on that then you need to make the ivalue_importer.cpp smart enough to recognize their relationship and emit them correctly (e.g. emit a "view" op to represent one in terms of the other)

silvasean avatar Aug 11 '22 13:08 silvasean

The current blocker for this model is adding support for multiple indexing tensors in aten.index.Tensor with dynamic dimensions. I opened an issue discussing adding support for this case at #1226.

qedawkins avatar Aug 15 '22 20:08 qedawkins

So is the blocker the "Unhandled tensor that shares storage with another tensor" error or aten.index.Tensor?

silvasean avatar Aug 15 '22 21:08 silvasean

Ah sorry I'm not up to date on the former error. @vivekkhandelwal1 can comment on whether that has been unblocked.

qedawkins avatar Aug 15 '22 22:08 qedawkins

So is the blocker the "Unhandled tensor that shares storage with another tensor" error or aten.index.Tensor?

aten.index.Tensor

vivekkhandelwal1 avatar Aug 17 '22 06:08 vivekkhandelwal1

Are there any remaining issues with V-Difussion?

silvasean avatar Oct 07 '22 13:10 silvasean

Are there any remaining issues with V-Difussion?

No

vivekkhandelwal1 avatar Oct 10 '22 11:10 vivekkhandelwal1

Is there any end-to-end examples for V-Diffusion?

tanyokwok avatar Oct 10 '22 11:10 tanyokwok

Is there any end-to-end examples for V-Diffusion?

It's available here: https://github.com/nod-ai/SHARK/tree/main/tank/pytorch/v_diffusion_pytorch

vivekkhandelwal1 avatar Oct 10 '22 11:10 vivekkhandelwal1