seamless_communication Silent Parts of the Audio

First of all, thanks for sharing this great work as open source.

When I use seamless m4t with 15 sec audio, the translated version's length is 5 sec. The silent parts are removed from the audio but I want to perform this translation process while keeping the length of the original audio. Do you know how I can do that?

Dec 19 '23 10:12 m-pektas

Hi! One potential solution would be the following:

Detect the silence and voice in the source audio using some external voice activity detection model.
Split the source audio into the voice-only and silence-only segments
Translate the voice-only segments with Seamless
Concatenate the silence segments with the translated segments in the right order, to get the right duration.

Mar 14 '24 15:03 avidale

Hi, @avidale. Thanks for your answer. I solved the problem completely same approach. But there was another issue here. The length of the translated voice could be different sometimes. If the translated length is shorter the solution is simple we need to add extra silence to silent parts. But if the translated length is taller, unfortunately, we need a new solution :)

Mar 14 '24 15:03 m-pektas