dsnote icon indicating copy to clipboard operation
dsnote copied to clipboard

Request: Enhance command line options

Open H-B-Schmidt opened this issue 8 months ago • 3 comments

First thank you for this great app!

I would like to use it without the GUI and call it by a script.

My use case is a video file as input and an SRT subtitle file as output.

I currently do not see a command line action which can do this.

We then would user another service to read the SRT file by an AI voice and replace the video audio track with that voice.

This is used to anonymize educational videos at our university.

H-B-Schmidt avatar Apr 08 '25 08:04 H-B-Schmidt

I understand the need, but Speech Note is a GUI application. Command line options exist not for "headless" use, but for desktop integration. Using Speech Note as a terminal program would not be very convenient, as it has a very very slow start.

In my opinion, much better results can be get by directly using whisper.cpp or FasterWhisper. Both of these engines have easy-to-use command line tools that can be used for automation.

mkiol avatar Apr 12 '25 17:04 mkiol

Thanks for the tip to use whisper.cpp! I installed it but the created SRT file using 'WisperCPP Medium En' was inferior to the one I get with your fine app. Can you provide the whisper.cpp paramters you are using internally? I guess mine are sub optimal.

H-B-Schmidt avatar Apr 16 '25 09:04 H-B-Schmidt

In what aspect are the results with whisper.cpp worse? Less accuracy, slower performance?

Speech Note does not use the whisper.cpp executable, but libwhisper.so library, so it is not easy to map the 1-1 parameters.

In general:

  • Profile "Best Performance" (default)
    • threads: 4
    • Beam search width: 1
    • Audio ctx size: Dynamic, adjusted to the duration of speech chunk
  • Profile "Best quality"
    • threads: 4
    • Beam search width: 5
    • Audio ctx size: 1500

The SRT segmentation is also Speech Note's own implementation, so it may differ from the results from whisper.cpp.

mkiol avatar Apr 17 '25 18:04 mkiol