VoiceStreamAI
VoiceStreamAI copied to clipboard
Near-Realtime audio transcription using self-hosted Whisper and WebSocket in Python/JS
hello @alesaccoia , thanks for the work on VoiceStreamAI , I was trying to make this work but , after setting the config and successfully establishing the websocket connection ,...
While running the program during silence periods irrelevant output like "okay" and "thank you" appears. Is there a way to fix this or is it a feature of faster-whisper
I am using this code, but I have a question. src/buffering_strategy/buffering_strategies.py line:87 I can't understand why the following code is needed? `self.client.buffer.clear()` Please help me. Thank you.
Wondering if the documentation of samples_width is correct here. https://github.com/alesaccoia/VoiceStreamAI/blob/465403b7039d1f54ba6b8d69c69c40b55bf300c1/src/server.py#L25 Because when calculating for buffer and scratch_buffer, there is no division by 8. https://github.com/alesaccoia/VoiceStreamAI/blob/465403b7039d1f54ba6b8d69c69c40b55bf300c1/src/buffering_strategy/buffering_strategies.py#L116-L118 https://github.com/alesaccoia/VoiceStreamAI/blob/465403b7039d1f54ba6b8d69c69c40b55bf300c1/src/buffering_strategy/buffering_strategies.py#L73-L77
https://github.com/alesaccoia/VoiceStreamAI/assets/39730824/c86907e4-f6df-4bce-bf75-090ee34f7384
how to use audio chunks from a video element and transcribe it? thanks
Current UI: https://github.com/alesaccoia/VoiceStreamAI/assets/39730824/98cef236-7ed5-474c-b184-b082b8526df9 New UI: https://github.com/alesaccoia/VoiceStreamAI/assets/39730824/c997744b-63c9-4bc2-ad62-a537a06a8d96