[Feature Request] Add support for Whisper’s --initial-prompt option in settings
Description / Background
When using Handy for transcribing long Chinese audio clips, I’ve noticed that the output often lacks proper punctuation or sentence segmentation. Whisper supports an --initial-prompt parameter (also known as an “initial context” or “prompt prefix”) that allows users to guide the model’s behavior. Adding this option would significantly improve multilingual transcription—especially for Chinese—by helping Whisper produce more natural and properly punctuated text.
Suggestion
Add a “Whisper Initial Prompt” or “Prompt Prefix” field in the Handy settings (or configuration file), allowing users to define a custom initial prompt passed to the model.
This small addition would greatly improve usability for multilingual users.
Thank you for the awesome project and your hard work! 🙏
BetterDictation let you change the prompt. I'd love this for Handy. To remove stammers and such.
Let's get this implemented and working well in cjpais/transcribe-rs from there I am happy to add support. I know there is an open PR on that repo, but from my testing, the feature is not working, so I cannot pull in a PR which is not working.
Let's get this implemented and working well in
cjpais/transcribe-rsfrom there I am happy to add support. I know there is an open PR on that repo, but from my testing, the feature is not working, so I cannot pull in a PR which is not working.
I have re-added the unit tests and they have passed the test. Is there any issue with running them on your end? https://github.com/cjpais/transcribe-rs/pull/8/commits/eecd4fc3a44319d3188d48890687b5b923e62b45
@easychen the unit tests as I read them don't test the feature well. Can we just do simple asserts on the expectation of no prompt vs prompted? This would make it very clear and obvious that the prompt is working as expected. It makes the test code readable and understandable for what is being tested.
Also it would be very helpful to do this in English if possible. The OpenAI document I provided in the other thread has audio samples, and the expectation for the transformation of them with the prompt. They are a great outline for what to test to show the feature is working as expected. We can move this discussion there.
@easychen would you mind sending a new PR which implements this feature? I know you got it implemented in transcribe-rs, we can add it into debug settings for now. If it is there I am happy to accept the PR