whisper.unity icon indicating copy to clipboard operation
whisper.unity copied to clipboard

Hallucinations and VAD [BLANK_AUDIO] Generations

Open atx-barnes opened this issue 1 year ago • 5 comments

Tested with both small and tiny model sizes.

Using the Streaming example with VAD turned on etc. I've tried different settings and tried using a prompt to try and eliminate hallucinations and sound effects but to no avail or getting VAD to properly work I might be missing something because it treats the hallucinations of sounds like words so it struggles to turn on AD. Examples of outputs are below:

When I'm not talking and the background noise is low the following gets transcribed. Ideally, it would run inference in the background and only detect incoming audio from me talking, etc. [BLANK_AUDIO] [BLANK_AUDIO] [BLANK_AUDIO]

Most of the time with the tiny model, it loves to hallucinate sound effects from no audio or low background noises. (wind blowing), (clicking), (barking)

Are there any settings that I can try that would help eliminate hallucinations from no audio or static or get VAD correctly working?

Great project, excited for any future features or updates.

atx-barnes avatar Jul 28 '23 22:07 atx-barnes