pydub icon indicating copy to clipboard operation
pydub copied to clipboard

Split on silence but without changing the total length of the audio?

Open youssefavx opened this issue 4 years ago • 2 comments

Hi, I'm facing this problem where I'd like to programmatically split 2 files in the same way. Meaning I split the first file on its silence every 5 seconds, and then I'd like to split the second file with those exact time stamps. The sounds for the 2 files must line up exactly.

The way I'm going about this is to split a file initially, then for the second file, use the durations of the chunks of the previous file and add them up, and then subtract the last duration of the last chunk (of the first file) to get the next chunk [total_duration_sofar-lastchunkduration:total_duration_sofar] for the second file as I iterate over the second file.

The trouble is, that since I'd like the audio in both files to line up perfectly, I don't want there to be any drift. Does the "keep_silence" parameter affect this?

The problem is that the files seem to end at different points. For example for the first chunk, in the first file, it seems to end at a different point in the audio than the second chunk. The first chunk in the first file ends with someone saying "people" and the second chunk ends with that word cut off. It seems to be roughly a 0.7 second difference.

Here is my code so far:

from pydub import AudioSegment
from pydub.silence import split_on_silence
import os

def split(filepath):
    sound = AudioSegment.from_file(filepath)
    #original_sound = AudioSegment.from_file(original_filepath)
    dBFS = sound.dBFS
    chunks = split_on_silence(sound,
        min_silence_len = 500,
        silence_thresh = dBFS-16,
        #keep_silence = 250 #//optional
    )
    return chunks

original_sound = 'clean.wav'
originalchunks = split(original_sound)


lecture_title = 'lectitle.wav'

chunks = originalchunks
target_length = 4 * 1000 #//setting minimum length of each chunk to 4 seconds
output_chunks = [chunks[0]]
for chunk in chunks[1:]:
    if len(output_chunks[-1]) < target_length:
        output_chunks[-1] += chunk
    else:
        # if the last output chunk is longer than the target length,
        # we can start a new one
        output_chunks.append(chunk)

if not os.path.exists('segments'):
    os.mkdir('segments')


for i, chunk in enumerate(output_chunks):
    # Create a silence chunk that's 0.5 seconds (or 500 ms) long for padding.
    #silence_chunk = AudioSegment.silent(duration=500)

    # Add the padding chunk to beginning and end of the entire chunk.
    #audio_chunk = silence_chunk + chunk + silence_chunk

    # Normalize the entire chunk.
    #normalized_chunk = match_target_amplitude(audio_chunk, -20.0)

    # Export the audio chunk with new bitrate.
    #print(lecture_title + " {0}.wav.".format(i))
    chunk.export(
        'segments/' + lecture_title.replace('.wav','')  + "{0}.wav".format(i),
        format = "wav"
    )

Then for the second file:


if not os.path.exists('noisysegments'):
    os.mkdir('noisysegments')

simulated_noise = 'noisyfile.wav'

noisy_audio = AudioSegment.from_wav(simulated_noise)
first_noisy_chunk = noisy_audio[0:(output_chunks[0].duration_seconds) * 1000]
first_noisy_chunk.export('noisysegments/' + simulated_noise.replace('.wav','') + ' actual noise.wav')

firstchunkduration = output_chunks[0].duration_seconds * 1000
firstrun = True
duration_count = 0
for i, output_chunk in enumerate(output_chunks):
    if firstrun:
        lastchunkduration = output_chunk.duration_seconds * 1000
        duration_count = firstchunkduration + lastchunkduration
        noisy_chunk = noisy_audio[firstchunkduration:duration_count]


        firstrun = False
    else:
        duration_count+= output_chunk.duration_seconds * 1000
        next_start = duration_count - output_chunk.duration_seconds * 1000
        noisy_chunk = noisy_audio[next_start:duration_count]
        #lastchunkduration = output_chunk.duration_seconds * 1000
    noisy_chunk.export('noisysegments/' + simulated_noise.replace('.wav','') + ' actual noise' + str(i) + '.wav')

If you do this with 2 files, you'll find that the first chunk seems to end at different points in time.

youssefavx avatar Sep 29 '20 19:09 youssefavx

keep_silence = True should do the trick

~~but there is a bug in pydub.silence.split_on_silence: silence at the start of the first segment is removed~~

milahu avatar Feb 28 '22 16:02 milahu

Well, I am testing this and I constantly get +100ms for the joined chunks. I tried for 10 min, 30 min and 90 min audio. Results are more or less the same. I am using pydub for splitting files so they can be processed by openai whisper transcription in chunks and then I need to stitch them up and match timecodes so I get consistent subtitles back. I use keep_silence = True

alhafoudh avatar Mar 23 '23 13:03 alhafoudh