vosk-api icon indicating copy to clipboard operation
vosk-api copied to clipboard

test_gradio.py with vosk not working

Open erdoganensar opened this issue 1 year ago • 9 comments

Hello, I want to present vosk on the gradio service using the Turkish model, but I could not achieve this.

I can run the code but it doesn't write anything on the gradio screen.

When I examine the code a bit, I can actually get the sound in bytes, but rec.Result returns empty.


('\n ', (<vosk.KaldiRecognizer object at 0x000002543D113760>, []))


i am making the sample code with you can you help please;

import json
import gradio as gr

from vosk import KaldiRecognizer, Model

model = Model(r"C:\Users\Administrator\PycharmProjects\vosk\model\vosk-model-small-tr-0.3\vosk-model-small-tr-0.3")

def transcribe(data, state):
    sample_rate, audio_data = data
    audio_data = (audio_data >> 16).astype("int16").tobytes()

    if state is None:
        rec = KaldiRecognizer(model, sample_rate)
        result = []
    else:
        rec, result = state

    if rec.AcceptWaveform(audio_data):
        text_result = json.loads(rec.Result())["text"]
        if text_result != "":
            result.append(text_result)
        partial_result = ""
    else:
        partial_result = json.loads(rec.PartialResult())["partial"] + " "

    return "\n".join(result) + "\n" + partial_result, (rec, result)

gr.Interface(
    fn=transcribe,
    inputs=[
        gr.Audio(source="microphone", type="numpy", streaming=True),
        "state"
    ],
    outputs=[
        "textbox",
        "state"
    ],
    live=True).launch(share=True)

erdoganensar avatar Dec 02 '22 12:12 erdoganensar

Please click on "Edit" and format your post properly

nshmyrev avatar Dec 02 '22 15:12 nshmyrev

I edited but I don't understand exactly what you mean @nshmyrev

erdoganensar avatar Dec 05 '22 06:12 erdoganensar

Still needs edits. You can check here: https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-code

nshmyrev avatar Dec 05 '22 07:12 nshmyrev

I've edited, I request you to solve the problem. @nshmyrev

erdoganensar avatar Dec 05 '22 07:12 erdoganensar

Chrome browser doesn't support audio recording on localhost, do you try to run it on your local machine or remote server? Do you access remote over https? Did you enable audio recording on localhost as in here:

https://stackoverflow.com/questions/16835421/how-to-allow-chrome-to-access-my-camera-on-localhost

nshmyrev avatar Dec 05 '22 08:12 nshmyrev

Yes, I am running it locally. I made the chrome setting as below, but it still did not resolve. I even tried another browser and it still didn't work. When I edit the code, it seems like it can't convert the binary text. As a result, my share data text becomes null as a result, but the left object is full.

Also, the code I have shared now is not in the browser, but it works in the console. The conversion logic is a little different here, I want it to work on gradio as if it were a web server. @nshmyrev

erdoganensar avatar Dec 05 '22 10:12 erdoganensar

It is still a permission issue so you get no audio. You need to check javascript console log.

nshmyrev avatar Dec 05 '22 21:12 nshmyrev

Hi, I tried to use the vosk with Gradio, and its not working. Speech is not being recognized.

Chrome setting orign

Web console. web

mirfan899 avatar Jun 24 '23 01:06 mirfan899

I change “audio_data = (audio_data >> 16).astype("int16").tobytes()” to "audio_data=audio_data.astype("int16").tobytes()", then it works.

monk-after-90s avatar Aug 25 '23 10:08 monk-after-90s