Alexandru Papiu
Alexandru Papiu
@albertfgu I just used learned positional encodings - the sequence length was 64 (4 by 4 patches for a 32 by 32 image). I will try to reproduce the results...
Hey @metatl, try using the legacy_dh_order branch - the model was trained with a small but annoying difference in ordering of the hidden dimensions and head dimensions and unfortunately I...
Hey @aabzaliev - interesting results. Agreed the results aren't great (but kind of interesting too). I think it could be a few things. 1. The noise distribution - the defaults...
Hey the model uses https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html which should already use flash attention.
Hi @ericwudocomoi I added a [notebook](https://colab.research.google.com/drive/1sKk0usxEF4bmdCDcNQJQNMt4l9qBOeAM) in the https://github.com/apapiu/transformer_latent_diffusion?tab=readme-ov-file#usage tab: if you look in the notebook it will download some already preprocessed data including the val encodings and do a...
@ericwudocomoi Ok added another notebook in the https://github.com/apapiu/transformer_latent_diffusion?tab=readme-ov-file#usage subsection that should help you preprocess the images and text on your own dataset. Let me know if this helps and you're...
Yes it's a Keras version error. One solution is to save the model weights and then use the load_weights method on a newly instantiated model with the same architecture. I...
Hi - what dataset are you trying to use? For the text embedding you can use the get_text_encodings function and the images can just be resized to the appropriate size...
Hey! The speedup happens in the next line: `x0_pred = self.denoiser.predict(nn_inputs, batch_size=self.batch_size)`. Here we only have to call .predict once on the concatenated matrix which is faster than calling .predict...
`x0_pred_label` is the prediction conditioned on the text embedding and `x0_pred_no_label` is the unconditional prediction (where the text embedding input is 0).