Handy icon indicating copy to clipboard operation
Handy copied to clipboard

[Feature Request] Add support for Whisper’s --initial-prompt option in settings

Open easychen opened this issue 2 months ago • 5 comments

Description / Background

When using Handy for transcribing long Chinese audio clips, I’ve noticed that the output often lacks proper punctuation or sentence segmentation. Whisper supports an --initial-prompt parameter (also known as an “initial context” or “prompt prefix”) that allows users to guide the model’s behavior. Adding this option would significantly improve multilingual transcription—especially for Chinese—by helping Whisper produce more natural and properly punctuated text.

Suggestion

Add a “Whisper Initial Prompt” or “Prompt Prefix” field in the Handy settings (or configuration file), allowing users to define a custom initial prompt passed to the model.

This small addition would greatly improve usability for multilingual users.

Thank you for the awesome project and your hard work! 🙏

easychen avatar Oct 11 '25 02:10 easychen

BetterDictation let you change the prompt. I'd love this for Handy. To remove stammers and such.

zirath avatar Oct 13 '25 10:10 zirath

Let's get this implemented and working well in cjpais/transcribe-rs from there I am happy to add support. I know there is an open PR on that repo, but from my testing, the feature is not working, so I cannot pull in a PR which is not working.

cjpais avatar Oct 16 '25 05:10 cjpais

Let's get this implemented and working well in cjpais/transcribe-rs from there I am happy to add support. I know there is an open PR on that repo, but from my testing, the feature is not working, so I cannot pull in a PR which is not working.

I have re-added the unit tests and they have passed the test. Is there any issue with running them on your end? https://github.com/cjpais/transcribe-rs/pull/8/commits/eecd4fc3a44319d3188d48890687b5b923e62b45

easychen avatar Oct 16 '25 08:10 easychen

@easychen the unit tests as I read them don't test the feature well. Can we just do simple asserts on the expectation of no prompt vs prompted? This would make it very clear and obvious that the prompt is working as expected. It makes the test code readable and understandable for what is being tested.

Also it would be very helpful to do this in English if possible. The OpenAI document I provided in the other thread has audio samples, and the expectation for the transformation of them with the prompt. They are a great outline for what to test to show the feature is working as expected. We can move this discussion there.

cjpais avatar Oct 16 '25 14:10 cjpais

@easychen would you mind sending a new PR which implements this feature? I know you got it implemented in transcribe-rs, we can add it into debug settings for now. If it is there I am happy to accept the PR

cjpais avatar Nov 03 '25 16:11 cjpais