rectified-flow-pytorch Feature Request: Text (and other modality) conditioning + CFG

Feature Request: Text (and other modality) conditioning + CFG

Open moiseshorta opened this issue 11 months ago • 2 comments

Hello,

Thanks so much for open sourcing the code.

I have been training an unconditioned RF model on audio latents, with really good quality results.

Here's some audio examples: https://drive.google.com/file/d/169NMzxl0k5X8oqiadNs3e7sjlxz8V5Pk/view?usp=sharing

https://drive.google.com/file/d/1-beCKB8XMQTxtsa64odmMdmmzei7Sejr/view?usp=sharing

https://drive.google.com/file/d/10BswfbCt6Tq3q7RX6lq4-ZiQ-mRUOKP-/view?usp=drive_link

I've been trying out how to implement the text conditioned embeddings using the T5-base model, but so far haven't had good results in the training.

Any chance this will be implemented in a future version?

Thanks again!

Jan 07 '25 01:01 moiseshorta

very cool! yes I'll get around to it, back logged with too many ongoing projects

Jan 07 '25 14:01 lucidrains

Bumping this up, hoping it gets implemented soon 👍

Feb 06 '25 13:02 moiseshorta