Handy
Handy copied to clipboard
feat: Implement OpenAI style local API server for audio transcription
Before Submitting This PR
Please confirm you have done the following:
- [X] I have searched existing issues and pull requests (including closed ones) to ensure this isn't a duplicate
- [X] I have read CONTRIBUTING.md
If this is a feature or change that was previously closed/rejected:
- [ ] I have explained in the description below why this should be reconsidered
- [ ] I have gathered community feedback (link to discussion below)
Human Written Description
I implemented a local STT API that follows the OpenAI Whisper format. Currently, the Whisper model is only accessible within Handy; however, many users want to leverage this functionality for external tasks like subtitle transcription without loading multiple model instances. This change exposes the speech-to-text capability as a standardized service, allowing users to do more with limited system memory.
Related Issues/Discussions
Fixes # None Discussion: https://github.com/cjpais/Handy/discussions/241
Community Feedback
https://github.com/cjpais/Handy/discussions/241
Testing
Environment:
- Tested on: macOS 26.2 (Apple Silicon M1 Pro)
- Status: Functional on macOS. Need help testing on Windows and Linux platforms to ensure consistent behavior.
Test Cases:
- Features: Tested by calling the API using
curland Demo: convert MP3 to SRT - On-demand Loading: Verified via curl that calling the
/v1/audio/transcriptionsendpoint correctly triggers the model loading process in the background. - Waiting Mechanism: Confirmed the API response waits until the model is fully loaded before processing the transcription, preventing "Model not loaded" errors.
- Verified Limitations: Tested various audio formats and confirmed only MP3 currently works reliably; documented this behavior and added a "welcome PRs" note in
LOCAL_API.mdto guide future contributors.