seamless_communication icon indicating copy to clipboard operation
seamless_communication copied to clipboard

Silent Parts of the Audio

Open m-pektas opened this issue 2 years ago • 2 comments

First of all, thanks for sharing this great work as open source.

When I use seamless m4t with 15 sec audio, the translated version's length is 5 sec. The silent parts are removed from the audio but I want to perform this translation process while keeping the length of the original audio. Do you know how I can do that?

m-pektas avatar Dec 19 '23 10:12 m-pektas

Hi! One potential solution would be the following:

  1. Detect the silence and voice in the source audio using some external voice activity detection model.
  2. Split the source audio into the voice-only and silence-only segments
  3. Translate the voice-only segments with Seamless
  4. Concatenate the silence segments with the translated segments in the right order, to get the right duration.

avidale avatar Mar 14 '24 15:03 avidale

Hi, @avidale. Thanks for your answer. I solved the problem completely same approach. But there was another issue here. The length of the translated voice could be different sometimes. If the translated length is shorter the solution is simple we need to add extra silence to silent parts. But if the translated length is taller, unfortunately, we need a new solution :)

m-pektas avatar Mar 14 '24 15:03 m-pektas