TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

example of sd tensorrt model and Lora/ControlNet model fusion by refit?

Open chengzihua opened this issue 2 years ago • 11 comments

Description

I have tried to fuse the sd model and the torch model of lora/controlnet and then transfer to tensorrt, but how can I fuse the tensorrt model of sd and the lora/controlnet model in real time? Is there a sample?

chengzihua avatar Jun 02 '23 04:06 chengzihua

If you can export then in a single onnx model, I think then you are done. Please correct me if I understand wrong.

zerollzeng avatar Jun 04 '23 14:06 zerollzeng

Or you can build multiple engines and put them inside a cuda stream.

zerollzeng avatar Jun 04 '23 14:06 zerollzeng

Or you can build multiple engines and put them inside a cuda stream.

I need to replace different lora models in real time. If the sd model and the lora model are fused and then converted, it will take a long time, so I hope to use refit to replace the weight on the sd base model. I don’t know if I can give an example here.

chengzihua avatar Jun 05 '23 06:06 chengzihua

@chengzihua you can find a refit sample at https://github.com/NVIDIA/TensorRT/tree/release/8.6/samples/python/engine_refit_onnx_bidaf

BowenFu avatar Jun 06 '23 07:06 BowenFu

Or you can build multiple engines and put them inside a cuda stream.

I need to replace different lora models in real time. If the sd model and the lora model are fused and then converted, it will take a long time, so I hope to use refit to replace the weight on the sd base model. I don’t know if I can give an example here.

I have the same confusion. How did you solve it

hx621 avatar Oct 09 '23 06:10 hx621

@hx621 you can check this sample and related codes. https://github.com/NVIDIA/TensorRT/tree/release/9.0/demo/Diffusion#generate-an-image-guided-by-a-text-prompt-and-using-specified-lora-model-weight-updates

BowenFu avatar Oct 09 '23 09:10 BowenFu

@hx621 you can check this sample and related codes. https://github.com/NVIDIA/TensorRT/tree/release/9.0/demo/Diffusion#generate-an-image-guided-by-a-text-prompt-and-using-specified-lora-model-weight-updates

thanks for your reply, i will check it

hx621 avatar Oct 10 '23 03:10 hx621

@hx621 you can check this sample and related codes. https://github.com/NVIDIA/TensorRT/tree/release/9.0/demo/Diffusion#generate-an-image-guided-by-a-text-prompt-and-using-specified-lora-model-weight-updates

Unfortunately, there is still a lack of solutions for dynamic Lora fusion. A feasible but heavy workaround is to fit the Lora weight/bias before exporting to ONNX, similar to the sample and related code implemented here. Each export operation will result in saving a new, large model. Considering the arbitrary combinations of different LoRa modules, which puts significant pressure on storage.

lxp3 avatar Oct 17 '23 17:10 lxp3

@hx621 you can check this sample and related codes. https://github.com/NVIDIA/TensorRT/tree/release/9.0/demo/Diffusion#generate-an-image-guided-by-a-text-prompt-and-using-specified-lora-model-weight-updates

Unfortunately, there is still a lack of solutions for dynamic Lora fusion. A feasible but heavy workaround is to fit the Lora weight/bias before exporting to ONNX, similar to the sample and related code implemented here. Each export operation will result in saving a new, large model. Considering the arbitrary combinations of different LoRa modules, which puts significant pressure on storage.

You can try removing the stale models after refitting from them to save some storage.

BowenFu avatar Oct 18 '23 02:10 BowenFu

@hx621 you can check this sample and related codes. https://github.com/NVIDIA/TensorRT/tree/release/9.0/demo/Diffusion#generate-an-image-guided-by-a-text-prompt-and-using-specified-lora-model-weight-updates

Unfortunately, there is still a lack of solutions for dynamic Lora fusion. A feasible but heavy workaround is to fit the Lora weight/bias before exporting to ONNX, similar to the sample and related code implemented here. Each export operation will result in saving a new, large model. Considering the arbitrary combinations of different LoRa modules, which puts significant pressure on storage.

You can try removing the stale models after refitting from them to save some storage.

Yes, thank you. Btw., it will be excited to see the future implementaion in onnx/tensorrt that we can merge Lora as a plug-in module like pytorch flexibly :)

lxp3 avatar Oct 18 '23 03:10 lxp3

@hx621 you can check this sample and related codes. https://github.com/NVIDIA/TensorRT/tree/release/9.0/demo/Diffusion#generate-an-image-guided-by-a-text-prompt-and-using-specified-lora-model-weight-updates

Unfortunately, there is still a lack of solutions for dynamic Lora fusion. A feasible but heavy workaround is to fit the Lora weight/bias before exporting to ONNX, similar to the sample and related code implemented here. Each export operation will result in saving a new, large model. Considering the arbitrary combinations of different LoRa modules, which puts significant pressure on storage.

You can try removing the stale models after refitting from them to save some storage.

Yes, thank you. Btw., it will be excited to see the future implementaion in onnx/tensorrt that we can merge Lora as a plug-in module like pytorch flexibly :)

HI @lxp3 Would you mind to share the solution about merge lora to tensorrt engine?

bigmover avatar May 29 '24 09:05 bigmover