How can I change the duration of the output audio?
Hi, This is really a lovely repository. But how can I change the duration of the generated audio? Thanks!!
Okay, so I just kind of forced a different latent embedding size. I wanted one second of output, so I divided the original latent dimension (256) by 10 and then rounded up.
def prepare_latents(self, batch_size, inference_scheduler, num_channels_latents, dtype, device):
# EDIT: they are hardcoding the latent size here!! to 256! I want to change this!
shape = (batch_size, num_channels_latents, 256, 16)
shape = (batch_size, num_channels_latents, 26, 16) # scaled to one second???
Indeed, the inference script now outputs audio files that are 1 second in length. Is this....okay??
I suppose duration could be introduced as a training argument, and then saved as part of the training config and used in this way to adjust the lengths of the audio generated during the inference process...
Would really like an audio sample duration feature as well!