rectified-flow-pytorch
rectified-flow-pytorch copied to clipboard
Feature Request: Text (and other modality) conditioning + CFG
Hello,
Thanks so much for open sourcing the code.
I have been training an unconditioned RF model on audio latents, with really good quality results.
Here's some audio examples: https://drive.google.com/file/d/169NMzxl0k5X8oqiadNs3e7sjlxz8V5Pk/view?usp=sharing
https://drive.google.com/file/d/1-beCKB8XMQTxtsa64odmMdmmzei7Sejr/view?usp=sharing
https://drive.google.com/file/d/10BswfbCt6Tq3q7RX6lq4-ZiQ-mRUOKP-/view?usp=drive_link
I've been trying out how to implement the text conditioned embeddings using the T5-base model, but so far haven't had good results in the training.
Any chance this will be implemented in a future version?
Thanks again!