Patrick von Platen
Patrick von Platen
Please let me know if this didn't fully answer your questions @LinxiFan !
Sorry, @patil-suraj is 100% correct, I was a bit thrown off by a rather unusual (not power of 2) number, like 77 -> but it's indeed the maximum length and...
Hey @LinxiFan, It seems like the original CompVis repo simply truncates longer inputs lengths to the max length which is 77 (check [here](https://github.com/CompVis/stable-diffusion/blob/3fbbf28a8a66d6f0ffcaca43a76521b6d9b5bfb3/ldm/modules/encoders/modules.py#L153)), this is the same we've now implemented...
Regarding the rectangular image - sorry I misunderstood! This is indeed a functionality we have not yet implemented in the `StableDiffusionPipeline`! @anton-l @patil-suraj - let's implement it as well? (see:...
Already on it seems :heart_eyes: https://github.com/huggingface/diffusers/pull/179
I think we can close this issue no since #179 is merged
Sounds like a cool idea @ethancohen123! @anton-l if we have a text2image fine-tuning / training script it would be quite trivial to experiment with such ideas in `diffusers`
Hey @UdonDa, That's a good question! I'm sadly not too familiar with `TensorRT` but diffusion processes indeed suffer from slow inference often. I'd be happy to allow integrations with `TensorRT`,...
Hey, Hmm if it requires major code additions, it might be a bit too early to add to the library at this stage. Happy to help if you're interested in...
Cool also linking this to our current speed-up PRs: - https://github.com/huggingface/diffusers/pull/532 - https://github.com/huggingface/diffusers/pull/371 - https://github.com/huggingface/diffusers/pull/511