nerd-dictation Use audio or video file as input instead of microphone

Use audio or video file as input instead of microphone

Open mahor1221 opened this issue 2 years ago • 4 comments

It would be nice if there was a flag so you could convert an audio or video file to text.

At the moment I use desktop background sound as a virtual microphone with pavucontrol and it works flawlessly.

Feb 06 '22 04:02 mahor1221

While I don't think it's a priority to support arbitrary input (this moves away from general dictation).

It seems reasonable to support a --stdin command line argument which could take audio data from the standard input instead of recording from a microphone - this would allow input to be piped from FFMPEG or any other commands that generate audio data.

Feb 06 '22 23:02 ideasman42

At the moment I use desktop background sound as a virtual microphone with pavucontrol and it works flawlessly.

Could you go into more detail on what steps you took to accomplish this? I installed pavucontrol but cant figure out how to "use desktop background sound as a virtual microphone"

Jun 01 '22 16:06 mstyp

Could you go into more detail on what steps you took to accomplish this? I installed pavucontrol but cant figure out how to "use desktop background sound as a virtual microphone"

There is a nice explanation here: https://unix.stackexchange.com/questions/82259/how-to-pipe-audio-output-to-mic-input And this is my settings: 2022-06-02_10-57-522

Jun 02 '22 06:06 mahor1221

I tried the above mentioned method for the following:

Watching a russian tv channel and get its speech transcribed. For the big model it didn't seem to work, but for the small model it started to transcribe, although I did have some inconsistencies. I won't go into details at least for now.

I would like to further process the transcribed text, namely, translate it. I tried running https://github.com/soimort/translate-shell with trans -shell -brief but this interactive mode is only translating line by line, so it will translate once enter/return is pressed. However nerd-dictation never presses enter as stated in the readme file too so there's a problem. Since you can add python scripting to manipulate the output, I guess I could add enter/return presses every 5 seconds for example can't I? I know the translation will be quite off but maybe it will get better in time so it would be nice to have the setup working.

Aug 27 '22 18:08 khlsvr

nerd-dictation nerd-dictation copied to clipboard

Use audio or video file as input instead of microphone

nerd-dictation
nerd-dictation copied to clipboard