tango icon indicating copy to clipboard operation
tango copied to clipboard

How can I change the duration of the output audio?

Open ChloeL19 opened this issue 2 years ago • 3 comments

Hi, This is really a lovely repository. But how can I change the duration of the generated audio? Thanks!!

ChloeL19 avatar Jun 13 '23 00:06 ChloeL19

Okay, so I just kind of forced a different latent embedding size. I wanted one second of output, so I divided the original latent dimension (256) by 10 and then rounded up.

def prepare_latents(self, batch_size, inference_scheduler, num_channels_latents, dtype, device):
    # EDIT: they are hardcoding the latent size here!! to 256! I want to change this!
    shape = (batch_size, num_channels_latents, 256, 16)
    shape = (batch_size, num_channels_latents, 26, 16) # scaled to one second???

Indeed, the inference script now outputs audio files that are 1 second in length. Is this....okay??

ChloeL19 avatar Jun 13 '23 01:06 ChloeL19

I suppose duration could be introduced as a training argument, and then saved as part of the training config and used in this way to adjust the lengths of the audio generated during the inference process...

ChloeL19 avatar Jun 13 '23 02:06 ChloeL19

Would really like an audio sample duration feature as well!

cvillela avatar Jul 03 '23 21:07 cvillela