vosk-api
vosk-api copied to clipboard
not getting transcribed text
i am sending audio blob using mediarecorder to django websocket using this navigator.mediaDevices.getUserMedia({ audio: true }).then((stream) => { const mediaRecorder = new MediaRecorder(stream, { mimeType: 'audio/webm' }); const socket = new WebSocket('ws://localhost:8000/ws/transcribe/');
mediaRecorder.addEventListener('dataavailable', async (event) => { if (event.data.size > 0) { const audioBlob = await event.data.arrayBuffer(); socket.send(audioBlob); console.log(audioBlob); } }); but i am not able to convert audioBlob to proper vosk supported format tried everything but not able to convert it. Earlier i tried by saving audio and then converting it that works fine but now i am trying to do real time transcription that is not working
not able to convert using this async def convert_to_wav(self, audio_blob): sample_rate = 16000 try: audio = AudioSegment.from_file(io.BytesIO(audio_blob), format="webm") audio = audio.set_frame_rate(sample_rate).set_channels(1).set_sample_width(2)
wav_buffer = io.BytesIO()
with wave.open(wav_buffer, 'wb') as wav_file:
wav_file.setnchannels(1)
wav_file.setsampwidth(2)
wav_file.setframerate(sample_rate)
wav_file.writeframes(audio.raw_data)
return wav_buffer.getvalue()
except Exception as e:
print("Error converting to WAV:", str(e))
return None
Tried another approach also but not getting desired result async def convert_to_wav(self, audio_blob): sample_rate = 16000 try:
wav_buffer = io.BytesIO()
cmd = [
'ffmpeg', '-loglevel', 'quiet', '-f', 's16le', '-ar', str(sample_rate),
'-ac', '1', '-i', '-', '-f', 'wav', '-'
]
process = await asyncio.create_subprocess_exec(
*cmd,
stdin=asyncio.subprocess.PIPE,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE
)
process.stdin.write(audio_blob)
process.stdin.close()
wav_data = await process.stdout.read()
await process.wait()
wav_buffer.write(wav_data)
return wav_buffer.getvalue()
except Exception as e:
print("Error converting to WAV:", str(e))
return None
It is not trivial to convert webm stream. Use wav format for mediarecorder instead. In general, we have client samples in vosk-server project, you can check web example here. If you want fast response, use webrtc instead of websocket.
can you please provide me the link for client samples and for webrtc
https://github.com/alphacep/vosk-server/tree/master/client-samples
https://github.com/alphacep/vosk-server/tree/master/webrtc
Thank You! so much let me explore this. One more question is there any vosk model for en-hi which can transcribe both hindi and english like model can write hindi words in english and english in english
We have initial model like this, it is not yet released.
for sending audio through websocket i am using this now function sendAudio(audioDataChunk) {
if (webSocket.readyState === WebSocket.OPEN) {
const inputData = audioDataChunk.inputBuffer.getChannelData(0) || new Float32Array(bufferSize);
const targetBuffer = new Int16Array(inputData.length);
for (let index = inputData.length - 1; index >= 0; index--) {
targetBuffer[index] = 32767 * Math.min(1, inputData[index]);
}
webSocket.send(targetBuffer.buffer);
console.log(targetBuffer.buffer)
}
}
and for transcribing i am doing this:-
async def transcribe_audio(self, audio_data):
try:
print("Transcribing audio...")
if not self.recognizer:
print("Recognizer not initialized.")
return ""
self.recognizer.AcceptWaveform(audio_data)
print("Waveform accepted.")
result = json.loads(self.recognizer.Result())
print("Recognition result:", result)
transcribed_text = result.get("text", "").strip()
partial_result = self.recognizer.PartialResult()
if partial_result:
partial_text = json.loads(partial_result).get("text", "").strip()
print("Partial Transcription:", partial_text)
print("Transcription:", transcribed_text)
return transcribed_text
except Exception as e:
print("Recognition error:", str(e))
return ""
so here i am getting transcribed text but it is missing few words here i am using sample rate of 8000hz