bonlime
bonlime
First, thanks for a very interesting paper. Looking through your code I see that you pass use `_build_causal_attention_mask` and pass this attention mask to text encoder during training, which indeed...
Hi! Thanks for a very interesting paper, I wonder if you've tried generating shorter/longer clips? I see that there is `temporal_position_encoding_max_len=24` which limits the length to be 24 frames, but...
Hi! First of all thanks for a very good model. The Stable Diffusion v2 used `v-prediction` target and argued that it's better than default `epsilon` prediction, but why do you...
Currently the SRVGG model can't be passed through torch.jit.script, this commit fixes it
Hey, I've read through your papers and I like the idea of token merging. I've experimented a little bit with applications to Stable Diffusion and found one potential source of...
@MC-E Hey! First of all thanks for a very interesting paper, I like your approach more than CN, due to it being much faster. Looking Figure 5. in your paper...
Hi @inbarhub First of all thanks for a very good and interesting paper, really enjoyed reading. I wonder if it's possibly to apply the derived noise maps to schedulers other...
Hey @cloneofsimo what's the license for the code in this repo?
Hey, i accidentally found your repo and according to your commits you're working on having a layerwise-textual inversion + use LORA to compress the embeddings size. I've been thinking about...