tango How can I change the duration of the output audio?

Hi, This is really a lovely repository. But how can I change the duration of the generated audio? Thanks!!

Jun 13 '23 00:06 ChloeL19

Okay, so I just kind of forced a different latent embedding size. I wanted one second of output, so I divided the original latent dimension (256) by 10 and then rounded up.

def prepare_latents(self, batch_size, inference_scheduler, num_channels_latents, dtype, device):
    # EDIT: they are hardcoding the latent size here!! to 256! I want to change this!
    shape = (batch_size, num_channels_latents, 256, 16)
    shape = (batch_size, num_channels_latents, 26, 16) # scaled to one second???

Indeed, the inference script now outputs audio files that are 1 second in length. Is this....okay??

Jun 13 '23 01:06 ChloeL19

I suppose duration could be introduced as a training argument, and then saved as part of the training config and used in this way to adjust the lengths of the audio generated during the inference process...

Jun 13 '23 02:06 ChloeL19

Would really like an audio sample duration feature as well!

Jul 03 '23 21:07 cvillela