AudioLDM
AudioLDM copied to clipboard
multichannel / stereo
I think I've heard some examples in stereo? Is this possible using the CLI version?
In the Colab notebook stereo is simulated by separate generations of left and right channel and mashing them together into one stereo file. Left channel audio is generated by text prompt, right channel audio is generated by style-transferring the same prompt to the newly generated left channel audio with a low transfer strength. It's not very authentic, but it works pretty well most of the time.
I suppose you can do that in CLI version too, then just use a third party tool like ffmpeg or sox to merge the channels:
sox -M left.wav right.wav stereo.wav