[Request] an Idea about fixing realtime transcription latency

Open DK013 opened this issue 2 years ago • 1 comments

Seems like the winsper.cpp project has an example of realtime transcription using C++. Is it possible to port the same logic to C# using yor implementation of D3D?

And a shoutout for taking this initiative. It's a dang good project. I'm a beginner at python as well and the lack of direct-ml support was giving me hell. This helped a lot. But I got few years of C# experience. If I can help in this project anyhow, let me know. For starters you can guide me to where you put the latency values for realtime speech recognition. I'll give it a try to solve the problem you had. No promisses though.

Thanks

Mar 20 '23 06:03 DK013

@DK013 The C# API allows to run real-time capture, see the MicrophoneCS example.

The logic for the microphone capture is in Capture::run() C++ method. As you see, it’s currently hard to port that specific function to C++, because it consumes IMFSourceReader COM interface, and VAD which is implemented in C++.

The magic numbers affecting the latency are in the sCaptureParams structure. That structure is exposed to C#.

Mar 20 '23 10:03 Const-me