vosk-api icon indicating copy to clipboard operation
vosk-api copied to clipboard

not getting transcribed text

Open ankur995 opened this issue 10 months ago • 7 comments

i am sending audio blob using mediarecorder to django websocket using this navigator.mediaDevices.getUserMedia({ audio: true }).then((stream) => { const mediaRecorder = new MediaRecorder(stream, { mimeType: 'audio/webm' }); const socket = new WebSocket('ws://localhost:8000/ws/transcribe/');

mediaRecorder.addEventListener('dataavailable', async (event) => { if (event.data.size > 0) { const audioBlob = await event.data.arrayBuffer(); socket.send(audioBlob); console.log(audioBlob); } }); but i am not able to convert audioBlob to proper vosk supported format tried everything but not able to convert it. Earlier i tried by saving audio and then converting it that works fine but now i am trying to do real time transcription that is not working

ankur995 avatar Aug 28 '23 09:08 ankur995

not able to convert using this async def convert_to_wav(self, audio_blob): sample_rate = 16000 try: audio = AudioSegment.from_file(io.BytesIO(audio_blob), format="webm") audio = audio.set_frame_rate(sample_rate).set_channels(1).set_sample_width(2)

    wav_buffer = io.BytesIO()
    with wave.open(wav_buffer, 'wb') as wav_file:
        wav_file.setnchannels(1)
        wav_file.setsampwidth(2)
        wav_file.setframerate(sample_rate)
        wav_file.writeframes(audio.raw_data)
    
    return wav_buffer.getvalue()
except Exception as e:
    print("Error converting to WAV:", str(e))
    return None

Tried another approach also but not getting desired result async def convert_to_wav(self, audio_blob): sample_rate = 16000 try:

        wav_buffer = io.BytesIO()

        cmd = [
            'ffmpeg', '-loglevel', 'quiet', '-f', 's16le', '-ar', str(sample_rate),
            '-ac', '1', '-i', '-', '-f', 'wav', '-'
        ]

        process = await asyncio.create_subprocess_exec(
            *cmd,
            stdin=asyncio.subprocess.PIPE,
            stdout=asyncio.subprocess.PIPE,
            stderr=asyncio.subprocess.PIPE
        )

        process.stdin.write(audio_blob)
        process.stdin.close()

        wav_data = await process.stdout.read()
        await process.wait()

        wav_buffer.write(wav_data)

        return wav_buffer.getvalue()
    except Exception as e:
        print("Error converting to WAV:", str(e))
        return None

ankur995 avatar Aug 28 '23 09:08 ankur995

It is not trivial to convert webm stream. Use wav format for mediarecorder instead. In general, we have client samples in vosk-server project, you can check web example here. If you want fast response, use webrtc instead of websocket.

nshmyrev avatar Aug 28 '23 09:08 nshmyrev

can you please provide me the link for client samples and for webrtc

ankur995 avatar Aug 28 '23 09:08 ankur995

https://github.com/alphacep/vosk-server/tree/master/client-samples

https://github.com/alphacep/vosk-server/tree/master/webrtc

nshmyrev avatar Aug 28 '23 11:08 nshmyrev

Thank You! so much let me explore this. One more question is there any vosk model for en-hi which can transcribe both hindi and english like model can write hindi words in english and english in english

ankur995 avatar Aug 28 '23 11:08 ankur995

We have initial model like this, it is not yet released.

nshmyrev avatar Aug 28 '23 20:08 nshmyrev

for sending audio through websocket i am using this now function sendAudio(audioDataChunk) { if (webSocket.readyState === WebSocket.OPEN) { const inputData = audioDataChunk.inputBuffer.getChannelData(0) || new Float32Array(bufferSize); const targetBuffer = new Int16Array(inputData.length); for (let index = inputData.length - 1; index >= 0; index--) { targetBuffer[index] = 32767 * Math.min(1, inputData[index]); } webSocket.send(targetBuffer.buffer); console.log(targetBuffer.buffer) } }
and for transcribing i am doing this:- async def transcribe_audio(self, audio_data): try: print("Transcribing audio...") if not self.recognizer: print("Recognizer not initialized.") return ""

        self.recognizer.AcceptWaveform(audio_data)
        print("Waveform accepted.")

        result = json.loads(self.recognizer.Result())
        print("Recognition result:", result)
        transcribed_text = result.get("text", "").strip()

        partial_result = self.recognizer.PartialResult()
        if partial_result:
            partial_text = json.loads(partial_result).get("text", "").strip()
            print("Partial Transcription:", partial_text)

        print("Transcription:", transcribed_text)
        return transcribed_text
    except Exception as e:
        print("Recognition error:", str(e))
        return ""

    so here i am getting transcribed text but it is missing few words here i am using sample rate of 8000hz

ankur995 avatar Aug 29 '23 06:08 ankur995