rectified-flow-pytorch icon indicating copy to clipboard operation
rectified-flow-pytorch copied to clipboard

Feature Request: Text (and other modality) conditioning + CFG

Open moiseshorta opened this issue 9 months ago • 2 comments

Hello,

Thanks so much for open sourcing the code.

I have been training an unconditioned RF model on audio latents, with really good quality results.

Here's some audio examples: https://drive.google.com/file/d/169NMzxl0k5X8oqiadNs3e7sjlxz8V5Pk/view?usp=sharing

https://drive.google.com/file/d/1-beCKB8XMQTxtsa64odmMdmmzei7Sejr/view?usp=sharing

https://drive.google.com/file/d/10BswfbCt6Tq3q7RX6lq4-ZiQ-mRUOKP-/view?usp=drive_link

I've been trying out how to implement the text conditioned embeddings using the T5-base model, but so far haven't had good results in the training.

Any chance this will be implemented in a future version?

Thanks again!

moiseshorta avatar Jan 07 '25 01:01 moiseshorta