willow-inference-server icon indicating copy to clipboard operation
willow-inference-server copied to clipboard

Use dictation devices for audio input when available

Open kristiankielhofner opened this issue 2 years ago • 2 comments
trafficstars

On the client side (Electron/browser) our WebHID lib supports various dictation microphones.

When one of these devices is available as an audio source we should enforce/select usage of it for audio input.

Because we initialize the lib early and know if we have a supported device we can probably use that supported device type to select the audio source device (or even just regex match SpeechMike or PowerMic).

kristiankielhofner avatar Feb 28 '23 13:02 kristiankielhofner

A poor-man's approach to this, which would support more devices, would be to allow for listening to the mic full-time, and only enabling transcription when a certain volume threshold is reached. Maybe this is already supported by default - I think the trick would be having a timeout where after some configurable timeout (1 second)? willow would segment the recording.

The reason I call this a "poor man's dictation microphone" is that many mics have a Mute button which should drop the noise floor darn close to zero. In that case, this could be used as a proxy for the Enable button on a "real" dictation microphone.

Apologies if my suggestion lacks context - I'm brand new to the project and still brushing up on its capabilities. I'm very excited about this project!

tensiondriven avatar Jun 28 '23 23:06 tensiondriven

This issue is intended for specific applications where users already have dedicated dictation microphones from Philips, Nuance, etc.

We have an internal desktop application that binds to a global hotkey in the OS for (basically) push to talk on any hardware. I'm not sure when (or if) we'll ever release it but one of our devs worked on it for another project and it is powered by the WebRTC support in WIS.

kristiankielhofner avatar Jun 29 '23 02:06 kristiankielhofner