AudioLDM
AudioLDM copied to clipboard
Generating more than 10 seconds, with inpainting
Hello, I have been experimenting with > 10 seconds generation via infilling; %50 past audio (5 seconds) %50 blank audio (5 seconds). What I saw so far was;
- the infilling audio is significantly higher amplitude (that could be fixed, not a big issue)
- the infilling "music" is not coherent; when used for music generation, the output is very faded at the beginning and end of the masked region, only at the middle it resembles a normal gain (normal gain compared to itself - there's always a big amplitude difference wrt original audio)
Is there a way to improve this task, extending music generation by infilling?