AudioLDM icon indicating copy to clipboard operation
AudioLDM copied to clipboard

Generating more than 10 seconds, with inpainting

Open MultiTrickFox opened this issue 1 year ago • 0 comments

Hello, I have been experimenting with > 10 seconds generation via infilling; %50 past audio (5 seconds) %50 blank audio (5 seconds). What I saw so far was;

  1. the infilling audio is significantly higher amplitude (that could be fixed, not a big issue)
  2. the infilling "music" is not coherent; when used for music generation, the output is very faded at the beginning and end of the masked region, only at the middle it resembles a normal gain (normal gain compared to itself - there's always a big amplitude difference wrt original audio)

Is there a way to improve this task, extending music generation by infilling?

MultiTrickFox avatar Mar 08 '23 12:03 MultiTrickFox