dsnote icon indicating copy to clipboard operation
dsnote copied to clipboard

Dictation Functionality

Open unithqcc opened this issue 1 year ago • 1 comments

  1. Does functionality like saying "comma" or "period" to type "," or "." exist within the libraries or within dsnote/Speech Note? I would like to have this ability.
  2. Similarly, is there a way for the text to appear as one single paragraph until "new paragraph" or similar command is spoken? This would be preferred as opposed to a new line after each sentence.
  3. How are hotkeys assigned? I am using a Nuance PowerMic II, and would like to start/stop recording using this device.

I am using the English (Vosk Large)/en library in my testing. I use voice-to-text software for medical dictation and writing.

Thank you in advance.

unithqcc avatar Sep 18 '24 14:09 unithqcc

Hi, thanks for the question.

  1. Does functionality like saying "comma" or "period" to type "," or "." exist within the libraries or within dsnote/Speech Note? I would like to have this ability.
  2. Similarly, is there a way for the text to appear as one single paragraph until "new paragraph" or similar command is spoken? This would be preferred as opposed to a new line after each sentence.

Unfortunately, not yet. Currently there is no support for “voice commands”. This kind of function has been requested many times, so I think it is needed. Perhaps a simple replacement of “comma” with “,” and so on would be a good solution. Thanks for both ideas, I will try to implement something (very basic) in the next version.

  1. How are hotkeys assigned? I am using a Nuance PowerMic II, and would like to start/stop recording using this device.

If these buttons can generate x11 key-events, you can assign them to actions in Accessibility->Use global keyboard shortcuts setting. To detect what key code generates certain button, you can use xev linux tool. For instance when I press "Audio stop" special button on my keyboard, xev reports:

KeyRelease event, serial 40, synthetic NO, window 0x7800001,
    root 0x1e2, subw 0x0, time 1518656, (-986,770), root:(844,1744),
    state 0x0, keycode 174 (keysym 0x1008ff15, XF86AudioStop), same_screen YES,
    XLookupString gives 0 bytes: 
    XFilterEvent returns: False

In this example, XF86AudioStop is a x11 key event. To assign it to Speech Note action, you have to put "Stop" as a key combination. image Mappings for other special keys is here.

mkiol avatar Sep 19 '24 17:09 mkiol

The new version 4.7.0 is out and available on flathub.

The new version includes:

  • Rules for text transformations that can be applied after Speech to Text or before Text to Speech. With Rules, you can easily and flexibly correct errors in decoded text or correct mispronounced words.

Using this new feature, you can set up rules that automatically convert "comma" to "," and so on. You can also create a rule that inserts a new paragraph when you say "new paragraph".

Video presentation of all the new features introduced in version 4.7.0: https://www.youtube.com/watch?v=cEht4Fts6Bo

mkiol avatar Dec 29 '24 14:12 mkiol