talon-ai-tools icon indicating copy to clipboard operation
talon-ai-tools copied to clipboard

Ai dictation mode

Open C-Loftus opened this issue 1 year ago • 5 comments

  • Use system accessibility APIs to dynamically get the proper context and automatically fix dictation and all punctuation as you speak it

C-Loftus avatar Mar 28 '24 18:03 C-Loftus

@jaresty Opinions on this? I want to try and make it so that we can pass a lot of context to the model and that we can use Talon for the baseline speech to text and then we can still get the more specific formatting we want on stuff by using the model to fix up things like proper nouns and/or punctuation.

C-Loftus avatar Mar 28 '24 18:03 C-Loftus

We can also create model select or something similar to select a range in an editable text box by passing all the context to the model having it return the range, so we wouldn't need to highlight it. I think there is a ton of potential with accessibility APIs in general, but unfortunately this does mean some OS or beta/public release talon fragmentation.

C-Loftus avatar Mar 28 '24 18:03 C-Loftus

I think this is a great idea. One less step to correct dictation!

jaresty avatar Mar 28 '24 19:03 jaresty

This is a rough idea but is there someway to leverage the work from https://github.com/OpenInterpreter/open-interpreter @C-Loftus

4b11b4 avatar Apr 28 '24 17:04 4b11b4

This is a rough idea but is there someway to leverage the work from https://github.com/OpenInterpreter/open-interpreter @C-Loftus

Just curious do you have specific features in that repo you are looking for? @4b11b4 I am somewhat familiar with that, but not the specifics. This repo should have many of the same features but for voice. Since Talon packages in general are intended not to use external libraries, I've implemented most stuff from scratch.

For context (either you or anyone viewing this, this PR is sort of blocked at the moment since it relies upon Talon's accessibility bindings which aren't really documented and have dependencies on an underlying Rust library that sometimes doesn't behave as intended. Without being able to use these apis to pass additional surrounding context, real-time AI dictation fixes aren't particularly useful and it is just better to use model fix grammar as it is currently implemented

Let me know if you have other ideas or I am overlooking something you think could help this situation

C-Loftus avatar Apr 29 '24 00:04 C-Loftus

Closing this since it isn't really practical imo. Better to just use copilot or codeium. And axkit handles simpler context aware punctuation well on its own for macos, which would've been a big use case for this.

C-Loftus avatar Jul 19 '24 15:07 C-Loftus