psi SystemSpeechRecognizer stops before recognition is finished

Hi,

I'm using the SystemSpeechRecognizer to extract speech from an audio file. Therefore I'm using the following code:

using var pipeline = Pipeline.Create();
var store = PsiStore.Create(pipeline, "Audio", @"C:\temp\Stores");
var audio = new WaveFileAudioSource(pipeline, path);
var recognizer = new SystemSpeechRecognizer(pipeline);
var results = new List<IStreamingSpeechRecognitionResult>();
audio.PipeTo(recognizer);
audio.Write("Original", store);
recognizer.Out.Select(x => x.Audio).Write("REC_Audio", store);
recognizer.Out.Do(x => results.Add(x));
recognizer.PartialRecognitionResults.Do(x => results.Add(x));
pipeline.Run();
foreach(var res in results)
{
    Console.WriteLine(res);
}

As you see I'm collecting the partial results and the final results. I'm using the following audio file (from the cognitive services samples) to execute the code Audio_File. In the image below you see my results containing final and partial results.

Red: this is the last final result which I've received.
Blue: these are partial results which are detected afterwards

What I don't understand is, why partial results are detected (blue) and I never get another final result ? The pipeline stops before I get a final result. I would expect that the last partial result is also a final result. Or have I understood something wrong ? Also the RecognizeCompleted Emitter is never called so it seems to me that the recognizer didn't finish analyzing the audio.

Image PsiStudio final results audio buffers:

Is anything wrong what I have done ? Or have I understood something wrong?

Thanks for any help.

Dec 08 '20 17:12 tteichmeister

Thanks for reporting this, Thomas. I consider this a bug in the SystemSpeechRecognizer component. I'm working on a fix.

The explanation of the behavior is that pipelines containing finite source components such as the WaveFileAudioSource will automatically shut down when all sources have completed. Normally this would mean that the complete WAVE file would be pumped into the pipeline and allowed to drain through all of the reactive components downstream upon shutdown. However, the SystemSpeechRecognizer, while triggered by audio coming it, is not purely reactive. Instead, it emits messages asynchronously sometime after receiving audio. The result is that the pipeline shuts down out from under the recognizer while it has outstanding async tasks. This will be fixed shortly.

In the meantime, here is a workaround. Change the pipeline.Run(); line to the following:

recognizer.Out.Do(r => Console.WriteLine($"Result: {r}"));
Generators.Range(pipeline, 0, int.MaxValue, TimeSpan.FromSeconds(1)); // extra stream to hold pipeline open
pipeline.ProposeReplayTime(new TimeInterval(DateTime.UtcNow, DateTime.MaxValue)); // replay beyond finite source
pipeline.RunAsync();
Console.ReadKey();

That is, add an additional finite stream and propose a replay time that extends far beyond the WAVE file source, then run the pipeline async until the audio has all been processed (as determined by the user pressing a key).

Thanks again and we'll keep you posted on the fix.

Dec 09 '20 02:12 AshleyF

Hi @AshleyF Many thanks for your quick help. I tried both the workaround did work.

I also tried the changes which you've made in your pull request locally on my machine. What I found out is that, if I now connect to the to the RecognizeComplete Emitter the application suddenly throws an exception.

I only added this line to my existing example.

...
recognizer.RecognizeCompleted.Do(x => Console.WriteLine(x));
...

Before the exception didn't occur because I didn't used the RecognizeComplete Emitter. It seems to me that cloning the RecognizeCompletedEventArgs didn't work. The error occured in:

Do you know what could probably be the problem for that ?

Thanks in advance best regards tom

Dec 09 '20 14:12 tteichmeister

Hi @tteichmeister, it seems you've uncovered another bug.

Short answer: Perhaps avoid using this RecognizeCompleted stream until we've had a chance to fix this issue. Instead, try something like recognizer.Out.Where(r => r.IsFinal).

Long answer: We've tightened down the types that are allowed to be cloned to no longer includes messages containing IntPtrs as this RecognizeCompletedEventArgs indirectly does. The reasoning is that this would provide a "wormhole" violating component isolation. There is a mechanism by which you can register types with a flag to override this, but I shouldn't suggest using it. I tried with this particular type and it turns out that there are a plethora of internal types within that would all need to be registered. If you're interested, I could list all (30+) of them for you, but I'd recommend avoiding it instead if possible for now.

We've opened a work item to address this in general. Likely the full fix will include getting rid of all *EventArgs messages throughout and replacing them with the underlying primitives (e.g. AudioLevelUpdatedEventArgs merely houses an int) or with types of our own.

Thanks much for flagging this!

Dec 11 '20 01:12 AshleyF

psi psi copied to clipboard

SystemSpeechRecognizer stops before recognition is finished

psi
psi copied to clipboard