nerd-dictation icon indicating copy to clipboard operation
nerd-dictation copied to clipboard

Use audio or video file as input instead of microphone

Open mahor1221 opened this issue 2 years ago • 4 comments

It would be nice if there was a flag so you could convert an audio or video file to text.

At the moment I use desktop background sound as a virtual microphone with pavucontrol and it works flawlessly.

mahor1221 avatar Feb 06 '22 04:02 mahor1221

While I don't think it's a priority to support arbitrary input (this moves away from general dictation).

It seems reasonable to support a --stdin command line argument which could take audio data from the standard input instead of recording from a microphone - this would allow input to be piped from FFMPEG or any other commands that generate audio data.

ideasman42 avatar Feb 06 '22 23:02 ideasman42

At the moment I use desktop background sound as a virtual microphone with pavucontrol and it works flawlessly.

Could you go into more detail on what steps you took to accomplish this? I installed pavucontrol but cant figure out how to "use desktop background sound as a virtual microphone"

mstyp avatar Jun 01 '22 16:06 mstyp

Could you go into more detail on what steps you took to accomplish this? I installed pavucontrol but cant figure out how to "use desktop background sound as a virtual microphone"

There is a nice explanation here: https://unix.stackexchange.com/questions/82259/how-to-pipe-audio-output-to-mic-input And this is my settings: 2022-06-02_10-57-522

mahor1221 avatar Jun 02 '22 06:06 mahor1221

I tried the above mentioned method for the following:

Watching a russian tv channel and get its speech transcribed. For the big model it didn't seem to work, but for the small model it started to transcribe, although I did have some inconsistencies. I won't go into details at least for now.

I would like to further process the transcribed text, namely, translate it. I tried running https://github.com/soimort/translate-shell with trans -shell -brief but this interactive mode is only translating line by line, so it will translate once enter/return is pressed. However nerd-dictation never presses enter as stated in the readme file too so there's a problem. Since you can add python scripting to manipulate the output, I guess I could add enter/return presses every 5 seconds for example can't I? I know the translation will be quite off but maybe it will get better in time so it would be nice to have the setup working.

khlsvr avatar Aug 27 '22 18:08 khlsvr