Handy icon indicating copy to clipboard operation
Handy copied to clipboard

feat: Add transcription of audio file

Open olejsc opened this issue 3 months ago • 3 comments

This is a basic support for uploading files and having them transcribed (#299)

Dislaimer: AI was used to assist with this.

Feel free to correct, adjust and tweak as much as you like.

A couple of notes:

  • Several design liberties was taken. I.e how to handle history entries for example, but also design liberties where the upload functionality should reside. Feel free to adjust / correct it, this is just a Proof of concept really.
  • I tried as best as I can to avoid any new dependencies. Rodio which wraps around symphonia is used. The only new dependency is the dialogue file picker, which was needed to open the folders.
  • NOT tested on MacOS ( I dont own one..)
  • File limit set to 64 mb for the time beeing
  • A copy of the file is NOT made for the time beeing, so it does not get copied to the recordings folder, but instead the database gets a path to the file being transcribed.
    • If file is missing from its path, we cannot playback it (disabled playback button). The transcribed text remains in history though.
    • Audio recordings now have a microphone icon on their history entry, while manual file uploads have a document icon ( + tooltip)
  • Im unsure how it would handle if you were to start another transcribing (with microphone) while it is processing a file. Edge cases are plenty I suspect. 🤔
  • I think i managed to keep post processing functionality working with it, as it just re-uses existing post processing logic.
  • Existing logic for what to do with the transcription when its done should remain identical (copy/paste + storing in history).
  • AI helped me quite a far bit with this. 7-9 chats with it.

Core Features:

  • Users can now upload audio files (MP3, WAV, M4A, FLAC, OGG, AAC) through a new UploadAudioButton component
  • Backend transcribe_file command handles the full transcription pipeline: decode → transcribe → post-process → save → paste
  • New decode_audio_file function converts various audio formats to 16kHz mono PCM samples using the rodio decoder

History System Changes:

  • Added source_file_path column to distinguish between uploaded files and mic recordings
  • Uploaded files reference the original source file instead of creating WAV copies
  • File existence checks prevent playback errors for missing uploaded files
  • UI shows icons (FileText vs Mic) and warnings for missing source files

User Interface:

  • Upload button integrated into History Settings with loading states and error handling
  • Real-time event system for transcription status (file-transcription-started, completed, failed)
  • Audio player component supports disabled state for missing files

Technical Details:

  • Uses tauri-plugin-dialog for file picker integration
  • File validation: 64MB size limit, supported format checking
  • Full post-processing pipeline support (LLM, Chinese variant conversion)
  • Database migration to v4 for new schema
  • The feature maintains parity with regular recordings - uploaded files receive the same post-processing and are saved to history, but skip WAV duplication to avoid unnecessary storage.

Gif demo: demo-file-upload-audio-file-transcribe

olejsc avatar Nov 22 '25 18:11 olejsc

One more thing; I'm not sure how it fares with different audio formats. I had to take some considerate technical choices in terms of audio processing:

  • Decode an audio file (MP3, WAV, FLAC, etc.) to mono PCM samples at 16kHz . I just went with AI recomemndation on this topic. I have no technical clue if its good quality. When I tested it I could get the transcription to work at normal levels, but would be neat if someone with knowledge of audio processing could give their opinion on best practices here. Regarding mono/steroe they just get blended.

olejsc avatar Nov 22 '25 18:11 olejsc

I suspect this can be used to support issue for reprocessing transcriptions: https://github.com/cjpais/Handy/issues/125

olejsc avatar Nov 24 '25 11:11 olejsc

im not sure im ready to pull this feature in yet. not sure what the best ui for it is. this is okay, but i suspect theres something a bit nicer. not sure yet

cjpais avatar Nov 27 '25 10:11 cjpais

closing in favor of #381

cjpais avatar Nov 28 '25 00:11 cjpais