Handy icon indicating copy to clipboard operation
Handy copied to clipboard

Added feature of transcription of local files (WAV, MP3 and M4A) along with progressbar

Open Signal46 opened this issue 3 months ago • 17 comments

Hello! This is my first contribution to an open-source project on GitHub. I’m a big fan of Handy and saw a need for a feature to handle long-form recordings safely.

I realized after working on this that there is an existing PR (https://github.com/cjpais/Handy/pull/371) for file uploads, but I thought I would make this contribution anyway since this has a progress bar.

Motivation & Context There is a significant need for this feature in the public sector, specifically for municipalities and government agencies (e.g., here in Sweden).

Compliance & GDPR: Many organizations cannot use cloud-based transcription (which often costs ~$150/user/month) due to strict data compliance laws regarding sensitive meetings.

Local Processing: By keeping the file and transcription 100% local, Handy solves major legal hurdles regarding data transfer.

Efficiency: Estimates in a small Swedish municipality suggest transcription (including sensitive meetings) could save around 5000 hours administrative hours annually. Yes, 5000 hours, not 500... for a municipality of around 60,000 citizens.

Technical Details Implementation: This feature was built with the assistance of AI coding agents but has been manually reviewed and tested.

Testing: Validated on a standard work laptop using the Parakeet V3 model. Successfully transcribed 20 and 40-minute files without memory spikes or crashes.

I am happy to make changes or discuss how this might be merged or combined with existing efforts!


Summary

Adds the ability to import existing audio files (MP3, M4A, WAV) into Handy's history with automatic transcription. This feature includes real-time progress tracking, system notifications, and robust handling of long audio files using smart chunking (VAD).

Successfully transcribed files are automatically moved to the "Recordings" folder for organization.

Motivation Users requested the ability to transcribe existing audio files, not just live recordings. This feature enables batch processing of pre-recorded audio while maintaining the same quality and privacy guarantees as live transcription.

Changes Core Features File Import: Native file picker for selecting MP3, M4A, and WAV files. Smart Chunking: Uses Voice Activity Detection (VAD) to split audio on silence, preventing words from being cut in half and improving transcription quality. Progress Tracking: Real-time progress bar showing transcription progress (0-100%). System Notifications: Desktop notifications on completion or failure. Long File Support: Prevents crashes on files >20 minutes by processing in chunks. Auto-Model Loading: Automatically loads the transcription model if it's not currently loaded. File Management: Moves imported files to the recordings folder after successful transcription. Technical Implementation Backend ( src-tauri/ )

Audio Processing: Added decode_and_resample() using symphonia for multi-format support. Smart Chunking: Integrated SileroVad to detect silence windows around the target chunk size (30s) in TranscriptionManager . Import Workflow: New import_audio_file command handles decoding, resampling, smart chunking, transcription, and file management. Events: Emits import-status and transcription-progress events for UI updates. Database: Updated HistoryManager to store audio duration.

Frontend (src/) UI: Added "Import Audio File" button to History settings. Feedback: Implemented a progress bar component with percentage display. Integration: Added event listeners for status updates and system notifications.

Dependencies Backend: symphonia, tauri-plugin-notification, vad-rs Frontend: @tauri-apps/plugin-notification

Testing Tested on Windows laptop with:

✅ Short files (<1 min) ✅ Medium files (5-10 min) ✅ Long files (20+ min) - previously crashed, now works ✅ All supported formats (MP3, M4A, WAV) ✅ Model auto-loading and recovery if unloaded during transcription of long files ✅ Progress bar ✅ Notifications (success/failure)

Screenshots

image Skärmbild 2025-11-25 181150 image

Signal46 avatar Nov 25 '25 17:11 Signal46

Relates to https://github.com/cjpais/Handy/discussions/299

Signal46 avatar Nov 25 '25 17:11 Signal46

I notice this implements the actual storage of the file. I think supporting both scenarios (link to file path + actually copy the file to the recordings folder) would be nice, but maybe as a option the user can choose to copy or link "per file" chosen to transcribe (or, as a general option) ?

Its also nice with the support for large files - I didn't really try any long transcriptions in the other PR.

How does the 30 second batch work if it "clips" sentences/words in the middle?

On another note, I like how similar we've been thinking about where the "upload file" should be 😄

olejsc avatar Nov 25 '25 18:11 olejsc

Great question @olejsc ! I hadnt thought about that. I implemented smart chunking using VAD which will look for silence around the 30 second mark now and make the cut. It looks like the transcription was of a lot higher quality now after the change, I assumed it was simply because I was transcribing in Swedish before that the quality was low.

Regarding the file location, I was considering a future feature of automatically deleting these files because of privacy / compliance reasons but I'm not sure what is the best way of doing it.

Regarding the UI alignment: yes that quite suprising! I was thinking it would be the place with the least amount of changes to the original code.

Signal46 avatar Nov 26 '25 08:11 Signal46

@Signal46 im going to review this today or tomorrow, but if you don't mind, I would prefer transcribe-rs having the chunking done there if possible. Would streamline the code and be useful for more people that way I think

Happy to review a PR there for this too

cjpais avatar Nov 26 '25 10:11 cjpais

@cjpais thanks! I finished moving the chunking to transcribe-rs and made a pull request for that as well, thanks for your feedback. This is tested again and ready for review now.

Signal46 avatar Nov 27 '25 13:11 Signal46

Thank you so much for doing that. This is definitely gonna take me longer than I expected to review just because it's a major enough feature. And I think overall this PR looks good to me, but I do need to take some time and spend some time with it before I fully pull it in. I'm currently traveling and not in a consistent place, so when I do get to review this, I want to just spend a few solid hours with it. Uh and I expect that it'll take me at least a few solid hours to review uh before pulling in. So just give me a couple weeks, I think. But let's get this in. I just wanna give an update that I am thinking about it.

@Signal46 if you don't mind, maybe a second PR for transcribe-rs... hahah... to move the decode and resample bits there as well, so it natively can support many more file types? I think it would also simplify the code here which would be nice. This one is a bit smaller of a change to the library because I believe it already supports transcribing files directly, so all of the decoding and resampling could happen transparently I think

I know @olejsc you have been working on #371 as well which is quite similar, but I think maybe I slightly prefer the UI here

cjpais avatar Nov 28 '25 00:11 cjpais

@cjpais Yeah that sounds good, no need to rush! 👍

I will work on the moving the decode and resample bits to transcribe-rs as well but i think it will take a few days, maybe around tuesday.

Signal46 avatar Nov 28 '25 10:11 Signal46

one thing that was in PR #371 was a distinct icon for each history for if a entry was a recording (by the user) or a uploaded file. Is this relevant here ? I personally liked to have that distinction, but must admit it doesn't really provide any huge value. @cjpais It used a microphone icon (🎙️) for a recording entry, and some document icon (📃) for the uploaded file entries.

olejsc avatar Nov 28 '25 12:11 olejsc

I moved the decode and resampling to transcribe-rs and committed to the existing pull request https://github.com/cjpais/transcribe-rs/pull/14

Also added a microphone icon and document icon for the uploaded file entries in Handy.

Signal46 avatar Dec 01 '25 09:12 Signal46

this would be awesome; also to be able to record within the app itself and start / pause / stop a recording

genesis-gh-ggarrett avatar Dec 12 '25 14:12 genesis-gh-ggarrett