Request: Enhance command line options
First thank you for this great app!
I would like to use it without the GUI and call it by a script.
My use case is a video file as input and an SRT subtitle file as output.
I currently do not see a command line action which can do this.
We then would user another service to read the SRT file by an AI voice and replace the video audio track with that voice.
This is used to anonymize educational videos at our university.
I understand the need, but Speech Note is a GUI application. Command line options exist not for "headless" use, but for desktop integration. Using Speech Note as a terminal program would not be very convenient, as it has a very very slow start.
In my opinion, much better results can be get by directly using whisper.cpp or FasterWhisper. Both of these engines have easy-to-use command line tools that can be used for automation.
Thanks for the tip to use whisper.cpp! I installed it but the created SRT file using 'WisperCPP Medium En' was inferior to the one I get with your fine app. Can you provide the whisper.cpp paramters you are using internally? I guess mine are sub optimal.
In what aspect are the results with whisper.cpp worse? Less accuracy, slower performance?
Speech Note does not use the whisper.cpp executable, but libwhisper.so library, so it is not easy to map the 1-1 parameters.
In general:
- Profile "Best Performance" (default)
- threads: 4
- Beam search width: 1
- Audio ctx size: Dynamic, adjusted to the duration of speech chunk
- Profile "Best quality"
- threads: 4
- Beam search width: 5
- Audio ctx size: 1500
The SRT segmentation is also Speech Note's own implementation, so it may differ from the results from whisper.cpp.