vosk-api
vosk-api copied to clipboard
Using a microphone in C#
Hi! I have a problem with speech recognition through a microphone using NAudio.
public class VoskDemo {
static VoskRecognizer? rec;
static WaveFileWriter? writer;
private static void WaveInOnDataAvailable(object? sender, WaveInEventArgs e) {
try {
writer.Write(e.Buffer, 0, e.BytesRecorded);
if (rec.AcceptWaveform(e.Buffer, e.BytesRecorded)) Console.WriteLine(rec.Result());
else Console.WriteLine(rec.PartialResult());
} catch { }
}
public static void Main() {
Model model = new Model("model");
rec = new VoskRecognizer(model, 16000f);
WaveInEvent waveIn = new WaveInEvent();
waveIn.WaveFormat = new WaveFormat(16000, 2);
waveIn.DataAvailable += WaveInOnDataAvailable;
waveIn.StartRecording();
writer = new WaveFileWriter(@"D:\test.wav", waveIn.WaveFormat);
while (true) { Thread.Sleep(1000); }
}
}
When I write to a file, you can listen to it, but Vosk does not recognize speech
waveIn.WaveFormat = new WaveFormat(16000, 2);
You need to use 16000, 1
here not 2
. Audio has to be mono, not stereo
I tried it, it still doesn't work
waveIn.WaveFormat = new WaveFormat(16000, 2);
You need to use
16000, 1
here not2
. Audio has to be mono, not stereo
ok, share test.wav you recorded
I also used PortAudioSharp, but the sound from the microphone is distorted there. I don't know what the problem might be. In this case, the recognition works successfully, but because of the distortion, it does not recognize well.
public class VoskDemo2 {
static StreamParameters oParams;
static Model model = new Model("model");
static VoskRecognizer rec = new VoskRecognizer(model, 16000f);
static WaveFileWriter writer;
public static void Main() {
PortAudio.LoadNativeLibrary();
PortAudio.Initialize();
oParams.device = PortAudio.DefaultInputDevice;
if (oParams.device == PortAudio.NoDevice)
throw new Exception("No default audio input device available");
oParams.channelCount = 1;
oParams.sampleFormat = SampleFormat.Int16;
var stream = new PortAudioSharp.Stream(oParams, null, 16000, 8192, StreamFlags.ClipOff, playCallback, null);
writer = new WaveFileWriter(@"D:\test.wav", new WaveFormat(16000, 1));
stream.Start();
Console.WriteLine("Press any key to stop...");
Console.ReadKey();
stream.Stop();
}
private static StreamCallbackResult playCallback(IntPtr input, IntPtr output, System.UInt32 frameCount, ref StreamCallbackTimeInfo timeInfo, StreamCallbackFlags statusFlags, IntPtr dataPtr) {
byte[] buffer = new byte[frameCount];
Marshal.Copy(input, buffer, 0, buffer.Length);
System.IO.Stream streamInput = new MemoryStream(buffer);
using (System.IO.Stream source = streamInput) {
byte[] bufferRead = new byte[frameCount];
int bytesRead;
while ((bytesRead = source.Read(bufferRead, 0, bufferRead.Length)) > 0) {
writer.Write(bufferRead, 0, bytesRead);
if (rec.AcceptWaveform(bufferRead, bytesRead)) Console.WriteLine(rec.Result());
else Console.WriteLine(rec.PartialResult());
}
}
return StreamCallbackResult.Continue;
}
}
Source code: https://github.com/juliengabryelewicz/MicrophoneVosk Recordings from the microphone: VoskTest.zip Model: vosk-model-small-ru-0.22
I think you need to stick to NAudio variant. The sample you recorded is stereo, not mono, are you sure you used 1 in WaveFormat? Its unlikely the case because recorded file wouldn't be stereo then.
I used 2 in WaveFormat because 1 does not record sound.
You can probably convert stereo to mono inside WaveInOnDataAvailable yourself. Something like this:
short[] audio = new short[e.BytesRecorded / 4];
for (int i = 0; i < audio.Length; i++)
{
audio[i] = (BitConverter.ToSingle(buffer, i * 4) + BitConverter.ToSingle(buffer, i * 4 + 2)) / 2;
}
if (rec.AcceptWaveform(audio, audio.Length)) Console.WriteLine(rec.Result());
else Console.WriteLine(rec.PartialResult());
The code you provided leads to an error during debugging.
using Vosk;
using NAudio.Wave;
public class VoskDemo {
static Model model = new Model("model");
static VoskRecognizer rec = new VoskRecognizer(model, 16000f);
public static void Main() {
WaveInEvent waveIn = new WaveInEvent();
waveIn.WaveFormat = new WaveFormat(16000, 2);
waveIn.DataAvailable += WaveInOnDataAvailable;
waveIn.StartRecording();
Console.ReadKey();
}
private static void WaveInOnDataAvailable(object? sender, WaveInEventArgs e) {
short[] audio = new short[e.BytesRecorded / 4];
for (int i = 0; i < audio.Length; i++)
audio[i] = (short)((BitConverter.ToSingle(e.Buffer, i * 4) + BitConverter.ToSingle(e.Buffer, i * 4 + 2)) / 2);
if (rec.AcceptWaveform(audio, audio.Length)) Console.WriteLine(rec.Result());
else Console.WriteLine(rec.PartialResult());
}
}
System.ArgumentException: "Destination array is not long enough to copy all the items in the collection. Check array index and length. Arg_ParamName_Name"
I climbed into the settings of the microphone itself and everything worked out. The problem was not in the code. I do not know what I did, but I am glad that it works now
@Kikibyk
Hi! Can you tell me what version of NAudio you are using in the example you posted for issue #986?
I tried the portAudioSharp route and had problems with recognition which also may be related to the microphone settings. Since I already have my microphone working with another voice recognition package, pocketsphinx, I did not want to readjust between the two so I figured I'd try NAudio but every version so far gives me a variety of errors (bad image, can't find WaveInEvent...).
Thanks, Smbika007
I have tried this code but getting an error
'Destination array is not long enough to copy all the items in the collection. Check array index and length.'
While executing this LOC
audio[i] = (short)((BitConverter.ToSingle(e.Buffer, i * 4) + BitConverter.ToSingle(e.Buffer, i * 4 + 2)) / 2);