[Request] an Idea about fixing realtime transcription latency
Seems like the winsper.cpp project has an example of realtime transcription using C++. Is it possible to port the same logic to C# using yor implementation of D3D?
And a shoutout for taking this initiative. It's a dang good project. I'm a beginner at python as well and the lack of direct-ml support was giving me hell. This helped a lot. But I got few years of C# experience. If I can help in this project anyhow, let me know. For starters you can guide me to where you put the latency values for realtime speech recognition. I'll give it a try to solve the problem you had. No promisses though.
Thanks
@DK013 The C# API allows to run real-time capture, see the MicrophoneCS example.
The logic for the microphone capture is in Capture::run() C++ method. As you see, it’s currently hard to port that specific function to C++, because it consumes IMFSourceReader COM interface, and VAD which is implemented in C++.
The magic numbers affecting the latency are in the sCaptureParams structure. That structure is exposed to C#.