pysox splitting a file with silence

One typical use case for me is to split a file using the silence effect as outlined in this excellent sox tutorial https://madskjeldgaard.dk/posts/sox-tutorial-split-by-silence/

sox input.wav clip.wav silence 1 0.1 1% 1 0.1 1% : newfile : restart

gives "clip001.wav", "clip002.wav" etc. with only the audio in clip and without the silence in between.

Is it possible to do this in pysox (and apply a fade-in and fade-out to each resulting clip)?

Sep 19 '20 12:09 shakfu

Yes. See: https://github.com/rabitt/pysox/blob/master/sox/transform.py#L2769

Sep 21 '20 02:09 lostanlen

I should clarify my question more: I'm aware that the 'silence' effect is implemented. My questions was more related to how I could translate ": newfile : restart" idiom to pysox.

Sep 21 '20 02:09 shakfu

i'm not sure if this is featured in pysox. it might be good to bring this up to @rabitt

Sep 21 '20 03:09 lostanlen

Thanks for your response @lostanlen . I will post a more clearly specified feature request then.

Sep 21 '20 03:09 shakfu

I have a small bash script to use sox to split an input file into a number of clips based on a silence threshold and then applying a fade (in/out) to the resulting output files.

# splits input file into clip files based on a silence threshold
sox --show-progress $1 clip.wav silence 1 0.1 1% 1 0.1 1% : newfile : restart

# applies fade-in fade-out to each output file
for f in clip*
do
    name=$(basename -s .wav $f)
    newname="$name-f.wav"
    sox $f $newname fade 0.1 0
done

My question to @rabitt is whether it is possible to translate this script's functionality (in particular the ": newfile : restart" idiom) into a pure-python pysox solution.

Sep 21 '20 07:09 shakfu

Trying to convert the splitting part of the above sox call into current pysox, I got as far as the following:

import sox
t = sox.Transformer()
t.silence(1, 1.0, 0.1)
t.silence(-1, 1.0, 0.1)
t.build('input.wav', 'clip.wav', extra_args=[':', 'newfile', ':', 'restart'])

# pysox converts this into the following sox args:

args = ['sox', '-D', '-V2', '-c', '1', 's104.wav', 'clip.wav', 'silence', '1', '0.100000', '1.000000%', 'reverse', 'silence', '1', '0.100000', '1.000000%', 'reverse', ':', 'newfile', ':', 'restart']

The problem is that this doesn't split the input file. I presume the culprit is the default translation to 'reverse' which always output one file. Also the arg order for pysox silence didn't match with the command line silence args.

Sep 24 '20 09:09 shakfu

Hey @shakfu

My question to @rabitt is whether it is possible to translate this script's functionality (in particular the ": newfile : restart" idiom) into a pure-python pysox solution.

The short answer is, no pysox doesn't currently support the newfile : restart idiom, but we could extend the API to support it.

Also the arg order for pysox silence didn't match with the command line silence args.

Yes, this is the case for several of the transforms - the documentation should describe what each argument is doing, but it's true that it may not exactly match the command line tool's ordering. Note that for the silence command in particular, the "location" argument in pysox is there to support removing silence from the end of the file, hence the reverse.

Feb 18 '21 18:02 rabitt

@rabitt I do love pysox it rocks guys & thankx

I am going to steal @shakfu bash script but also would love to do this in pysox (creating model datasets for kws) Vad also but as Vad can sometimes be confusing (spectral representation doesn't always work out well) to results I often use silence as the result is more logical, rather than occasionally wondering about a curious VAD result.

Thnx @shakfu for the script as was just about to ask about silence splitting and saw your post

PS if it could also do no action but output split points to txt would also be useful might be useful with a ASR aligner as still have to get one that extracts words satisfactory.

Feb 22 '21 22:02 StuartIanNaylor

@rabitt Thanks for your response. It would be great if the pysox API could be extended to accommodate this use-case. Naturally, it would be great to accomplish this in python (-:

@StuartIanNaylor Thanks, glad that my little script can be of use. Incidentally, I was curious about the answer to your last question and found some possible solutions in this stack overflow exchange.

Feb 22 '21 23:02 shakfu

@shakfu

def get_voice_params(file, silence_maximum_amplitude,file_min_silence_duration=0.2):
  stat = sox.file_info.stat(file)
  file_maximum_amplitude = stat['Maximum amplitude']
  file_duration = stat['Length (seconds)']

  percent_silence_threshold = (silence_maximum_amplitude / file_maximum_amplitude) * 100

  tmp1 = tempfile.NamedTemporaryFile(suffix='.wav')
  tmp2 = tempfile.NamedTemporaryFile(suffix='.wav')

  tfm1 = sox.Transformer()
  tfm1.silence(location=-1, min_silence_duration=file_min_silence_duration, silence_threshold=percent_silence_threshold, buffer_around_silence=True)
  tfm1.build(kw_files[0], tmp1.name)
  tfm1.clear_effects()

  stat = sox.file_info.stat(tmp1.name)
  voice_end = stat['Length (seconds)']

  tfm1.silence(location=1, min_silence_duration=file_min_silence_duration, silence_threshold=percent_silence_threshold, buffer_around_silence=True)
  tfm1.build(tmp1.name, tmp2.name)
  tfm1.clear_effects()

  stat = sox.file_info.stat(tmp2.name)
  print(stat)
  voice_start = voice_end - stat['Length (seconds)']
  voice_duration = voice_end - voice_start
  return file_maximum_amplitude, file_duration, voice_start, voice_end, voice_duration


file_maximum_amplitude, file_duration, voice_start, voice_end, voice_duration = get_voice_params(kw_file, silence_maximum_amplitude)

print(file_maximum_amplitude, file_duration, voice_start, voice_end, voice_duration)

I just suddenly clicked and didn't like writing out to harddrive all the time use tmp... Still need to change the python logging to stop the warnings but an easy add.

May 12 '21 21:05 StuartIanNaylor