nerd-dictation
nerd-dictation copied to clipboard
Allow to be run on Windows platform
I have clients who run on Windows. Thus I propose this, which consists of:
- using
fmedia
insteadparec
, by detecting that the platform is notposix
; - using thread to get sound data instead of non blocking stream, which is a
posix
specific feature; - using
pynput
keyboard simulation - the option has to be passed.
Thus I got good results for my limited tests on a Windows 8 machine. And it still works on my Linux.
I used black
and mypy
passed.
The value of size block seems important for the reactivity. A big value like 1M give too long interval between results. A lower value like 1k seems to give a too high load. I put 16k which seems a good balance.
https://github.com/papoteur-mga/nerd-dictation/blob/1fb54dc4bd8fc389dbd18001c0185257d60c9fae/nerd-dictation#L99
If this is accepted, some documentation has to be edited.
Thanks for the PR, please squish noisy changes (for example, making sample rate a hard-coded value, then using the configured value again).
Hello, Should be OK to review now.
Ran some more tests on this and responsiveness seems to have dropped a little, from looking into this - it seems time.sleep(idle_time_test)
is sleeping even when the queue has data to process.
Changing the read size in enqueue_output
from 16384
to 1024
for example, stops input entirely on my system (dictation isn't working at all).
Besides this seeming error prone in general, higher sample rates could lock up recording.
The test to idle: idle_time > 0.0:
(after calling exit_fn(...)
) should only run when the queue is empty.
I have moved the test to idle in the case where the queue of data is empty. This is indeed more responsive. Thus I use smaller chunks from the queue.
I have moved the test to idle in the case where the queue of data is empty. This is indeed more responsive. Thus I use smaller chunks from the queue.
This change makes the idle-delay not fulfill it's original purpose.
- Whenever "read" is called on the
stdout
there will likely be some data. - Continually reading a few bytes from the stdout in a loop causes high CPU usage (on one core at least).
- First reading, then idling means idle may never happen or ...
- Depend on the buffering behavior of the program generating the output.
So I believe idle should only be changed to check the queue is empty, to allow time to build up the record buffer (defaulting to 1/10th of a second, enough to feel responsive without high CPU usage).
BTW, did you check on non-blocking reads for WIN32? https://stackoverflow.com/questions/2408560/non-blocking-console-input
I have tested it on Windows 8 only, I don't have Win32 here.
I displaced the test. This still works with responsiveness.
I've been considering this PR and something that doesn't sit well with me is that it complicates the code by adding threading which is only needed for Windows, which I don't use (and can't easily test).
Using threading (in my opinion) makes the code more complex and less hackable (which is a goal of this project I would like to keep prioritized).
So I would rather the windows support instead, replace the pipe read with a file-like object that manages reads that don't block on windows. (this class can use asyncio or threading internally, whatever is needed, but it should be isolated to the class and only used on windows).
This would allow windows support without complicating logic for Linux.
I kinda agree with @ideasman42 here, but I still wanted to say thank you very much for the code @papoteur-mga Also, with this snippet, I can use almost the same config file as I do with #27
diff --git a/nerd-dictation b/nerd-dictation
index a3cbad8..7f23be6 100755
--- a/nerd-dictation
+++ b/nerd-dictation
@@ -613,6 +613,40 @@ def text_from_vosk_pipe(
# Track this to prevent excessive load when the "partial" result doesn't change.
json_text_partial_prev = ""
+ def windows_parse_key(key):
+ from pynput.keyboard import Key
+ if (key == "alt"):
+ return Key.alt
+ if (key == "control"):
+ return Key.ctrl
+ if key == "Backspace":
+ return Key.backspace
+ if key == "Tab":
+ return Key.tab
+ if key == "enter":
+ return Key.enter
+ return None
+
+ def windows_command_hack(command, kbd):
+ from pynput.keyboard import Key
+ args = command.split('+')
+ leader_key = windows_parse_key(args[0])
+ if len(args) > 1 and leader_key != None:
+ with kbd.pressed(leader_key):
+ follower_key = windows_parse_key(args[1]);
+ if follower_key == None:
+ kbd.type(args[1])
+ else:
+ kbd.press(follower_key)
+ kbd.release(follower_key)
+ return
+ else:
+ if leader_key != None:
+ kbd.press(leader_key)
+ kbd.release(leader_key)
+ return
+ kbd.type(command)
+
def handle_fn_wrapper(text: str, is_partial_arg: bool) -> None:
nonlocal handled_any
# Simple deferred text input, just accumulate values in a list (finish entering text on exit).
@@ -653,8 +687,10 @@ def text_from_vosk_pipe(
if not is_partial_arg:
if macro is not None:
+ from pynput.keyboard import Controller # type: ignore
+ kbd = Controller();
for command in macro:
- subprocess.check_output(command).decode("utf-8")
+ windows_command_hack(command, kbd)
if progressive_continuous:
text_prev = ""
else:
so I can work from my windows work machine, and my home linux machine! and only one config file. sweeeeet.