speech_recognition Saved audio recorded with SR plays choppy and too fast

Saved audio recorded with SR plays choppy and too fast

Open antimatter84 opened this issue 2 years ago • 1 comments

Steps to reproduce

Record audio from an USB audio interface (Focusrite Scarlett) with Microphone() instance
Save to file with Pythons wave module

Here's an exemplary code that shows what I do (copied together from actual source):

import speech_recognition as sr
import wave

mic_index = 7  # focusrite scarlett input

recognizer = sr.Recognizer()
mic = sr.Microphone(device_index=mic_index)

print('Recording...')
with mic as source:
    recognizer.adjust_for_ambient_noise(source, duration=0.2)
    audio = recognizer.listen(source, timeout=1, phrase_time_limit=5)
    
wave_file = wave.open('audiotest.wav', 'wb')
wave_file.setnchannels(1)
wave_file.setsampwidth(2)
wave_file.setframerate(16000)
wave_file.writeframes(audio.get_wav_data(convert_rate=16000))
wave_file.close()

Expected behaviour

The written wave file should sound like the original audio source: clean and correct tempo

Actual behaviour

The written wave file sounds somewhat choppy and way too fast. audiotest.wav.zip

Recording audio from the device with arecord -D plughw:1,0 -f cd -d 5 alsatest.wav produces a clean result.

System information

(Delete all the statements that don't apply.)

My system is Linux Mint 20.3 Cinnamon.

My Python version is 3.8.10.

My Pip version is 20.0.2.

My SpeechRecognition library version is 3.9.0.

My PyAudio library version is 0.2.13

My microphones are:

HDA NVidia: HDMI 0 (hw:0,3)
HDA NVidia: HDMI 1 (hw:0,7)
HDA NVidia: HDMI 2 (hw:0,8)
HDA NVidia: HDMI 3 (hw:0,9)
HDA NVidia: HDMI 4 (hw:0,10)
HDA NVidia: HDMI 5 (hw:0,11)
HDA NVidia: HDMI 6 (hw:0,12)
Scarlett 2i2 USB: Audio (hw:1,0)
HD-Audio Generic: ALC1220 Analog (hw:2,0)
HD-Audio Generic: ALC1220 Digital (hw:2,1)
HD-Audio Generic: ALC1220 Alt Analog (hw:2,2)
C922 Pro Stream Webcam: USB Audio (hw:3,0)
hdmi
pulse
default

My working microphones are:

  7: 'Scarlett 2i2 USB: Audio (hw:1,0)', 
  11: 'C922 Pro Stream Webcam: USB Audio (hw:3,0)', 
  13: 'pulse', 
  14: 'default'
}

Jan 08 '23 01:01 antimatter84

Hi @antimatter84,

I had the exact same experience and what helped me greatly was playing around with the chunk_size parameter. In my case, setting it to 512 instead of the default 1024 drastically increased the quality of the recorded audio. That also made recognition (with VOSK) much more reliable.

Give it a shot and let me know how it goes :)

Dec 04 '23 13:12 pgeschwill

speech_recognition speech_recognition copied to clipboard

Saved audio recorded with SR plays choppy and too fast

Steps to reproduce

Expected behaviour

Actual behaviour

System information

speech_recognition
speech_recognition copied to clipboard