Speech to text directly from audio data stream without saving into (.wav) file

Open Niklaus28 opened this issue 3 years ago • 1 comments

I would like to translate the audio data directly with recognize_google function without saving the audio data to wav.file format.

Steps to reproduce

def save_chunk(fr,chunk_no,rate,p,r): ''' This function will take the frames for the each chunk and saves the file in the folder then extract the spoken text from the file and displays the text. Question : Can i extract the audio from frames directly using same methodology without having to create audio files in the folder? '''

print("this has recived {} frames".format(len(fr)))

try:
     
    chunk_name = "Chunks/chunk_{}.wav".format(chunk_no)
    wf = wave.open(chunk_name, 'wb')
    wf.setnchannels(1)
    wf.setsampwidth(p.get_sample_size(pyaudio.paInt16))
    wf.setframerate(rate)
    wf.writeframes(b''.join(fr))
    wf.close()
    
    
    with sr.AudioFile(chunk_name) as source:
        audio = r.listen(source)
        text = r.recognize_google(audio)
        print(text)
        
except Exception as e:
    print("No Audio Detected",e)

Expected behaviour

(What did you expect to happen?)

Actual behaviour

(What happened instead? How is it different from what you expected?)

(If the library threw an exception, paste the full stack trace here)

System information

(Delete all the statements that don't apply.)

My system is <INSERT SYSTEM HERE>. (For example, "Ubuntu 16.04 LTS x64", "Windows 10 x64", or "macOS Sierra".) Windows 10 x64 My Python version is <INSERT VERSION HERE>. (You can check this by running python -V.) Python 3.8.5 My Pip version is <INSERT VERSION HERE>. (You can check this by running pip -V.) 21.0.1 My SpeechRecognition library version is <INSERT VERSION HERE>. (You can check this by running python -c "import speech_recognition as sr;print(sr.__version__)".) 3.8.1 My PyAudio library version is <INSERT VERSION HERE> / I don't have PyAudio installed. (You can check this by running python -c "import pyaudio as p;print(p.__version__)".) 0.2.11 My microphones are: (You can check this by running python -c "import speech_recognition as sr;print(sr.Microphone.list_microphone_names())".)

My working microphones are: (You can check this by running python -c "import speech_recognition as sr;print(sr.Microphone.list_working_microphones())".)

I installed PocketSphinx from <INSERT SOURCE HERE>. (For example, from the Debian repositories, from Homebrew, or from the source code.)

Apr 14 '21 03:04 Niklaus28

@Uberi can you please take a look on this issue?

Apr 14 '21 03:04 Niklaus28

speech_recognition speech_recognition copied to clipboard

Speech to text directly from audio data stream without saving into (.wav) file

Steps to reproduce

print("this has recived {} frames".format(len(fr)))

Expected behaviour

Actual behaviour

System information

speech_recognition
speech_recognition copied to clipboard