diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Implementation for TensorRT

Open UdonDa opened this issue 3 years ago • 2 comments

Hi,

I guess that a reverse diffusion process can be performed fastly if an UNet implementation in this library can infer with TensorRT. Would you have the plan to implement for TensorRT?

UdonDa avatar Jun 28 '22 06:06 UdonDa

Hey @UdonDa,

That's a good question! I'm sadly not too familiar with TensorRT but diffusion processes indeed suffer from slow inference often. I'd be happy to allow integrations with TensorRT, do you have an idea of how to do so? Also cc @anton-l

patrickvonplaten avatar Jun 28 '22 10:06 patrickvonplaten

Hi, @patrickvonplaten, @anton-l!

Actually, it is easy to implement it using this library https://github.com/pytorch/TensorRT#python. The library provides a compiler so that we just compile our network instance.

However, I have two concerns. (1) I do not know if the compiled network can estimate an accurate score function. (2) It's necessary to re-write an unet implementation to compile, which is just how to define networks such as resblocks. Now, although I try to use TensorRT, the implementation cannot adapt it. If the modification is applied, the pretrained weight uploaded on a huggingface may not be correctly loaded.

UdonDa avatar Jun 28 '22 11:06 UdonDa

Hey,

Hmm if it requires major code additions, it might be a bit too early to add to the library at this stage. Happy to help if you're interested in adding some code though!

patrickvonplaten avatar Aug 23 '22 15:08 patrickvonplaten

This is the middle of the road, but it is necessary to be able to use tensorrt (through the torch_tensorrt library). Check out this implementation. Converting to torchscript and layer fusion increases the speed by 50%, which is pretty cool. Do you want to try adding it to the official library? https://github.com/cloneofsimo/sd-various-ideas/blob/main/create_jit.ipynb

batrlatom avatar Sep 13 '22 08:09 batrlatom

PhotoRoom made an awesome blog post on exactly how do this: Making stable diffusion 25% faster using TensorRT. They explain exactly what's happening, give all the sample code, and even performance metrics between the two models. They also reference the recent ONNX work in #284. Hope this helps!

gadicc avatar Sep 16 '22 12:09 gadicc

Cool also linking this to our current speed-up PRs:

  • https://github.com/huggingface/diffusers/pull/532
  • https://github.com/huggingface/diffusers/pull/371
  • https://github.com/huggingface/diffusers/pull/511

patrickvonplaten avatar Sep 21 '22 14:09 patrickvonplaten

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Oct 15 '22 15:10 github-actions[bot]