vosk-api icon indicating copy to clipboard operation
vosk-api copied to clipboard

Using a microphone in C#

Open czvelox opened this issue 2 years ago • 11 comments

Hi! I have a problem with speech recognition through a microphone using NAudio.

public class VoskDemo {
    static VoskRecognizer? rec;
    static WaveFileWriter? writer;

    private static void WaveInOnDataAvailable(object? sender, WaveInEventArgs e) {
        try {
            writer.Write(e.Buffer, 0, e.BytesRecorded);

            if (rec.AcceptWaveform(e.Buffer, e.BytesRecorded)) Console.WriteLine(rec.Result());
            else Console.WriteLine(rec.PartialResult());
        } catch { }
    }

    public static void Main() {
        Model model = new Model("model");
        rec = new VoskRecognizer(model, 16000f);

        WaveInEvent waveIn = new WaveInEvent();
        waveIn.WaveFormat = new WaveFormat(16000, 2);
        waveIn.DataAvailable += WaveInOnDataAvailable;
        waveIn.StartRecording();

        writer = new WaveFileWriter(@"D:\test.wav", waveIn.WaveFormat);
        while (true) { Thread.Sleep(1000); }
    }
}

When I write to a file, you can listen to it, but Vosk does not recognize speech

czvelox avatar May 31 '22 05:05 czvelox

   waveIn.WaveFormat = new WaveFormat(16000, 2);

You need to use 16000, 1 here not 2. Audio has to be mono, not stereo

nshmyrev avatar May 31 '22 07:05 nshmyrev

I tried it, it still doesn't work

   waveIn.WaveFormat = new WaveFormat(16000, 2);

You need to use 16000, 1 here not 2. Audio has to be mono, not stereo

czvelox avatar May 31 '22 09:05 czvelox

ok, share test.wav you recorded

nshmyrev avatar May 31 '22 10:05 nshmyrev

I also used PortAudioSharp, but the sound from the microphone is distorted there. I don't know what the problem might be. In this case, the recognition works successfully, but because of the distortion, it does not recognize well.

public class VoskDemo2 {
    static StreamParameters oParams;
    static Model model = new Model("model");
    static VoskRecognizer rec = new VoskRecognizer(model, 16000f);

    static WaveFileWriter writer;
    public static void Main() {
        PortAudio.LoadNativeLibrary();
        PortAudio.Initialize();

        oParams.device = PortAudio.DefaultInputDevice;
        if (oParams.device == PortAudio.NoDevice)
            throw new Exception("No default audio input device available");

        oParams.channelCount = 1;
        oParams.sampleFormat = SampleFormat.Int16;

        var stream = new PortAudioSharp.Stream(oParams, null, 16000, 8192, StreamFlags.ClipOff, playCallback, null);
        writer = new WaveFileWriter(@"D:\test.wav", new WaveFormat(16000, 1));

        stream.Start();
        Console.WriteLine("Press any key to stop...");
        Console.ReadKey();
        stream.Stop();
    }

    private static StreamCallbackResult playCallback(IntPtr input, IntPtr output, System.UInt32 frameCount, ref StreamCallbackTimeInfo timeInfo, StreamCallbackFlags statusFlags, IntPtr dataPtr) {
        byte[] buffer = new byte[frameCount];
        Marshal.Copy(input, buffer, 0, buffer.Length);
        System.IO.Stream streamInput = new MemoryStream(buffer);
        using (System.IO.Stream source = streamInput) {
            byte[] bufferRead = new byte[frameCount];
            int bytesRead;
            while ((bytesRead = source.Read(bufferRead, 0, bufferRead.Length)) > 0) {
                writer.Write(bufferRead, 0, bytesRead);

                if (rec.AcceptWaveform(bufferRead, bytesRead)) Console.WriteLine(rec.Result());
                else Console.WriteLine(rec.PartialResult());
            }
        }

        return StreamCallbackResult.Continue;
    }
}

Source code: https://github.com/juliengabryelewicz/MicrophoneVosk Recordings from the microphone: VoskTest.zip Model: vosk-model-small-ru-0.22

czvelox avatar Jun 01 '22 21:06 czvelox

I think you need to stick to NAudio variant. The sample you recorded is stereo, not mono, are you sure you used 1 in WaveFormat? Its unlikely the case because recorded file wouldn't be stereo then.

nshmyrev avatar Jun 01 '22 21:06 nshmyrev

I used 2 in WaveFormat because 1 does not record sound.

czvelox avatar Jun 01 '22 21:06 czvelox

You can probably convert stereo to mono inside WaveInOnDataAvailable yourself. Something like this:

short[] audio = new short[e.BytesRecorded / 4];
for (int i = 0; i < audio.Length; i++)
{
    audio[i] = (BitConverter.ToSingle(buffer, i * 4) + BitConverter.ToSingle(buffer, i * 4 + 2)) / 2;
}
if (rec.AcceptWaveform(audio, audio.Length)) Console.WriteLine(rec.Result());
else Console.WriteLine(rec.PartialResult());

nshmyrev avatar Jun 01 '22 21:06 nshmyrev

The code you provided leads to an error during debugging.

using Vosk;
using NAudio.Wave;

public class VoskDemo {
    static Model model = new Model("model");
    static VoskRecognizer rec = new VoskRecognizer(model, 16000f);

    public static void Main() {
        WaveInEvent waveIn = new WaveInEvent();

        waveIn.WaveFormat = new WaveFormat(16000, 2);
        waveIn.DataAvailable += WaveInOnDataAvailable;

        waveIn.StartRecording();
        Console.ReadKey();
    }

    private static void WaveInOnDataAvailable(object? sender, WaveInEventArgs e) {
        short[] audio = new short[e.BytesRecorded / 4];
        for (int i = 0; i < audio.Length; i++) 
            audio[i] = (short)((BitConverter.ToSingle(e.Buffer, i * 4) + BitConverter.ToSingle(e.Buffer, i * 4 + 2)) / 2);

        if (rec.AcceptWaveform(audio, audio.Length)) Console.WriteLine(rec.Result());
        else Console.WriteLine(rec.PartialResult());
    }
}
System.ArgumentException: "Destination array is not long enough to copy all the items in the collection. Check array index and length. Arg_ParamName_Name"

czvelox avatar Jun 01 '22 22:06 czvelox

I climbed into the settings of the microphone itself and everything worked out. The problem was not in the code. I do not know what I did, but I am glad that it works now

czvelox avatar Jun 01 '22 23:06 czvelox

@Kikibyk

Hi! Can you tell me what version of NAudio you are using in the example you posted for issue #986?

I tried the portAudioSharp route and had problems with recognition which also may be related to the microphone settings. Since I already have my microphone working with another voice recognition package, pocketsphinx, I did not want to readjust between the two so I figured I'd try NAudio but every version so far gives me a variety of errors (bad image, can't find WaveInEvent...).

Thanks, Smbika007

smbika007 avatar Aug 24 '22 16:08 smbika007

I have tried this code but getting an error

'Destination array is not long enough to copy all the items in the collection. Check array index and length.'

While executing this LOC

audio[i] = (short)((BitConverter.ToSingle(e.Buffer, i * 4) + BitConverter.ToSingle(e.Buffer, i * 4 + 2)) / 2);

farhantahir80 avatar Feb 28 '24 19:02 farhantahir80