TensorRT example of sd tensorrt model and Lora/ControlNet model fusion by refit?

Description

I have tried to fuse the sd model and the torch model of lora/controlnet and then transfer to tensorrt, but how can I fuse the tensorrt model of sd and the lora/controlnet model in real time? Is there a sample?

Jun 02 '23 04:06 chengzihua

If you can export then in a single onnx model, I think then you are done. Please correct me if I understand wrong.

Jun 04 '23 14:06 zerollzeng

Or you can build multiple engines and put them inside a cuda stream.

Jun 04 '23 14:06 zerollzeng

Or you can build multiple engines and put them inside a cuda stream.

I need to replace different lora models in real time. If the sd model and the lora model are fused and then converted, it will take a long time, so I hope to use refit to replace the weight on the sd base model. I don’t know if I can give an example here.

Jun 05 '23 06:06 chengzihua

@chengzihua you can find a refit sample at https://github.com/NVIDIA/TensorRT/tree/release/8.6/samples/python/engine_refit_onnx_bidaf

Jun 06 '23 07:06 BowenFu

Or you can build multiple engines and put them inside a cuda stream.

I need to replace different lora models in real time. If the sd model and the lora model are fused and then converted, it will take a long time, so I hope to use refit to replace the weight on the sd base model. I don’t know if I can give an example here.

I have the same confusion. How did you solve it

Oct 09 '23 06:10 hx621

@hx621 you can check this sample and related codes. https://github.com/NVIDIA/TensorRT/tree/release/9.0/demo/Diffusion#generate-an-image-guided-by-a-text-prompt-and-using-specified-lora-model-weight-updates

Oct 09 '23 09:10 BowenFu

@hx621 you can check this sample and related codes. https://github.com/NVIDIA/TensorRT/tree/release/9.0/demo/Diffusion#generate-an-image-guided-by-a-text-prompt-and-using-specified-lora-model-weight-updates

thanks for your reply, i will check it

Oct 10 '23 03:10 hx621

@hx621 you can check this sample and related codes. https://github.com/NVIDIA/TensorRT/tree/release/9.0/demo/Diffusion#generate-an-image-guided-by-a-text-prompt-and-using-specified-lora-model-weight-updates

Unfortunately, there is still a lack of solutions for dynamic Lora fusion. A feasible but heavy workaround is to fit the Lora weight/bias before exporting to ONNX, similar to the sample and related code implemented here. Each export operation will result in saving a new, large model. Considering the arbitrary combinations of different LoRa modules, which puts significant pressure on storage.

Oct 17 '23 17:10 lxp3

@hx621 you can check this sample and related codes. https://github.com/NVIDIA/TensorRT/tree/release/9.0/demo/Diffusion#generate-an-image-guided-by-a-text-prompt-and-using-specified-lora-model-weight-updates

Unfortunately, there is still a lack of solutions for dynamic Lora fusion. A feasible but heavy workaround is to fit the Lora weight/bias before exporting to ONNX, similar to the sample and related code implemented here. Each export operation will result in saving a new, large model. Considering the arbitrary combinations of different LoRa modules, which puts significant pressure on storage.

You can try removing the stale models after refitting from them to save some storage.

Oct 18 '23 02:10 BowenFu

@hx621 you can check this sample and related codes. https://github.com/NVIDIA/TensorRT/tree/release/9.0/demo/Diffusion#generate-an-image-guided-by-a-text-prompt-and-using-specified-lora-model-weight-updates

Unfortunately, there is still a lack of solutions for dynamic Lora fusion. A feasible but heavy workaround is to fit the Lora weight/bias before exporting to ONNX, similar to the sample and related code implemented here. Each export operation will result in saving a new, large model. Considering the arbitrary combinations of different LoRa modules, which puts significant pressure on storage.

You can try removing the stale models after refitting from them to save some storage.

Yes, thank you. Btw., it will be excited to see the future implementaion in onnx/tensorrt that we can merge Lora as a plug-in module like pytorch flexibly :）

Oct 18 '23 03:10 lxp3

@hx621 you can check this sample and related codes. https://github.com/NVIDIA/TensorRT/tree/release/9.0/demo/Diffusion#generate-an-image-guided-by-a-text-prompt-and-using-specified-lora-model-weight-updates

Unfortunately, there is still a lack of solutions for dynamic Lora fusion. A feasible but heavy workaround is to fit the Lora weight/bias before exporting to ONNX, similar to the sample and related code implemented here. Each export operation will result in saving a new, large model. Considering the arbitrary combinations of different LoRa modules, which puts significant pressure on storage.

You can try removing the stale models after refitting from them to save some storage.

Yes, thank you. Btw., it will be excited to see the future implementaion in onnx/tensorrt that we can merge Lora as a plug-in module like pytorch flexibly :）

HI @lxp3 Would you mind to share the solution about merge lora to tensorrt engine?

May 29 '24 09:05 bigmover

TensorRT TensorRT copied to clipboard

example of sd tensorrt model and Lora/ControlNet model fusion by refit?

Description

TensorRT
TensorRT copied to clipboard