OpenVoice How to decide the output audio duration

I want to know the exact output duration prior generating the audio through tone_color_converter.convert call. How may I achieve that ?

I believe that I need to use somehow the speed argument. Any help would be welcomed.

Feb 07 '25 07:02 UtkuBulkan

Hey @UtkuBulkan! I'm here to help you with any bugs, questions, or contributions you have. Let's tackle this issue together!

I couldn't find specific information on how the speed argument affects the output audio duration in the tone_color_converter.convert function. However, there are some default parameters in the function that might influence audio duration, such as tau=0.3 and message="default" ^[1]. Additionally, configuration settings in the codebase, like those in the split_audio_whisper and split_audio_vad functions, can affect audio duration by filtering segments based on their length or splitting audio into segments of a specified duration ^[2].

To predict the output audio duration, you might need to experiment with these parameters and observe their effects on the audio processing.

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other}

Feb 07 '25 07:02 dosubot[bot]

@dosu, how can I further eloborate this? I want to call the following function once with an additional argument of duration, so it will create alter the speed accordingly and produce an audio file in the exact duration I desire.

def convert(self, audio_src_path, src_se, tgt_se, output_path=None, tau=0.3, message="default"):

Feb 11 '25 22:02 UtkuBulkan

To achieve the desired output duration by adjusting the speed dynamically, you would need to calculate the required speed factor based on the original duration of the input audio and the target duration. Here's a general approach you can take:

Calculate the Original Duration: Use a utility function to determine the duration of the input audio file. You can use libraries like librosa or pydub to get the duration.
Determine the Speed Factor: Calculate the speed factor needed to achieve the target duration. This can be done using the formula: [ \text{speed_factor} = \frac{\text{original_duration}}{\text{target_duration}} ]
Modify the convert Function: Add a parameter for the target duration and adjust the speed accordingly before calling the conversion process.

Here's a conceptual example of how you might implement this:

import librosa

def convert(self, audio_src_path, src_se, tgt_se, output_path=None, tau=0.3, message="default", target_duration=None):
    # Load the audio to get the original duration
    original_duration = librosa.get_duration(filename=audio_src_path)
    
    # Calculate the speed factor if target_duration is provided
    if target_duration is not None:
        speed_factor = original_duration / target_duration
    else:
        speed_factor = 1.0  # Default speed

    # Use the speed factor in your conversion process
    # Assuming you have a method to set the speed in your model
    self.model.set_speed(speed_factor)
    
    # Proceed with the conversion
    # Your existing conversion logic here

This approach allows you to specify a target duration, and the function will adjust the speed to match that duration. Note that you may need to adapt this example to fit the specifics of your codebase and the convert function's implementation.

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other}

Feb 11 '25 22:02 dosubot[bot]